Emil Velikov [Mon, 18 Jan 2016 11:06:28 +0000 (13:06 +0200)]
mapi: include gl.xml in the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Emil Velikov [Thu, 14 Jan 2016 07:28:21 +0000 (09:28 +0200)]
i965: adding missing headers to the dist tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Christian König [Sun, 13 Dec 2015 10:44:13 +0000 (11:44 +0100)]
st/va: add motion adaptive deinterlacing v2
v2: minor cleanup
Signed-off-by: Christian König <christian.koenig@amd.com>
Michel Dänzer [Fri, 15 Jan 2016 07:02:22 +0000 (16:02 +0900)]
gallium/radeon: Rename do_invalidate_resource to invalidate_buffer
And only call it from r600_invalidate_resource for buffer resources.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Michel Dänzer [Fri, 15 Jan 2016 06:46:31 +0000 (15:46 +0900)]
st/dri: Don't call invalidate_resource for NULL depth/stencil buffers
Fixes crash in 4 EGL piglit tests with radeonsi.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Michel Dänzer [Fri, 15 Jan 2016 03:18:29 +0000 (12:18 +0900)]
radeonsi: Avoid warning about LLVM generating R_0286D0_SPI_PS_INPUT_ADDR
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Michel Dänzer [Fri, 15 Jan 2016 03:13:15 +0000 (12:13 +0900)]
radeonsi: Print "LLVM emitted unknown config register" warning only once
Say "LLVM" instead of "Compiler" for clarity.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Oded Gabbay [Sun, 17 Jan 2016 20:15:40 +0000 (22:15 +0200)]
llvmpipe: use vpkswss when dst is signed
This patch fixes a bug when building a pack instruction.
For POWER (altivec), in case the destination is signed and the
src width is 32, we need to use vpkswss. The original code used vpkuwus,
which emits an unsigned result.
This fixes the following piglit tests on ppc64le:
- spec@arb_color_buffer_float@gl_rgba8-drawpixels
- shaders@glsl-fs-fogscale
I've also corrected some coding style issues in the function.
v2: Returned else statements to vmware style
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Sun, 17 Jan 2016 04:23:35 +0000 (14:23 +1000)]
glsl: fix subroutine lowering reusing actual parmaters
One of the oglconform tests was crashing here, and it was
due to not cloning the actual parameters before creating the
new call. This makes a call clone function that does the right
things to make sure we clone all the needed info, and points
the callee at it. (It differs from ->clone due to this).
this may fix https://bugs.freedesktop.org/show_bug.cgi?id=93722, I had this
patch in my cts fixes tree, but hadn't had time to make sure I liked it.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Timothy Arceri [Fri, 15 Jan 2016 02:45:49 +0000 (13:45 +1100)]
glsl: remove special case for detecting stream duplicates
Any duplicates in a single declaration will already fail the
generic duplicates test due to the explicit_stream flag being set.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Timothy Arceri [Fri, 15 Jan 2016 02:45:48 +0000 (13:45 +1100)]
glsl: add missing explicit_stream flag to has_layout()
This will allow the ARB_shading_language_420pack rules in
glsl_parser.yy for catching duplicate layout qualifiers to be
triggered for the stream identifier rather than relying on the
code meant to catch duplicates within a single layout(...)
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Timothy Arceri [Sun, 17 Jan 2016 05:09:08 +0000 (16:09 +1100)]
mesa: fix segfault in glUniformSubroutinesuiv()
From Section 7.9 (SUBROUTINE UNIFORM VARIABLES) of the OpenGL
4.5 Core spec:
"The command
void UniformSubroutinesuiv(enum shadertype, sizei count,
const uint *indices);
will load all active subroutine uniforms for shader stage
shadertype with subroutine indices from indices, storing
indices[i] into the uniform at location i. The indices for
any locations between zero and the value of
ACTIVE_SUBROUTINE_UNIFORM_LOCATIONS minus one which are not
used will be ignored."
V2: simplify NULL check suggested by Jason.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: "11.0 11.1" mesa-stable@lists.freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=93731
Timothy Arceri [Mon, 18 Jan 2016 00:13:27 +0000 (11:13 +1100)]
glsl: fix segfault linking subroutine uniform with explicit location
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: "11.0 11.1" mesa-stable@lists.freedesktop.org
Ilia Mirkin [Sun, 17 Jan 2016 21:24:02 +0000 (16:24 -0500)]
gm107/ir: don't do indirect frag shader inputs on GM107
Apparently the IPA op decided to stop working with offsets. Need to
figure out if we need to do an AL2P situation or something similar. For
now just turn it back off.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Sun, 17 Jan 2016 08:44:22 +0000 (03:44 -0500)]
tgsi: initialize Atomic field in tgsi_default_declaration
Spotted by Coverity.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Ilia Mirkin [Sun, 17 Jan 2016 08:41:55 +0000 (03:41 -0500)]
nvc0: bsp_bo can't be null
We already deref it earlier. And these are all allocated on load.
Spotted by Coverity.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Oded Gabbay [Sun, 17 Jan 2016 12:25:32 +0000 (14:25 +0200)]
llvmpipe: fix arguments order given to vec_andc
This patch fixes a classic "confuse the enemy" bug.
_mm_andnot_si128 (SSE) and vec_andc (VMX) do the same operation, but the
arguments are opposite.
_mm_andnot_si128 performs "r = (~a) & b" while
vec_andc performs "r = a & (~b)"
To make sure this error won't return in another place, I added a wrapper
function, vec_andnot_si128, in u_pwr8.h, which makes the swap inside.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Rob Clark [Sun, 17 Jan 2016 17:21:45 +0000 (12:21 -0500)]
freedreno/ir3: fix mad 3rd src delay calc
In
fad158a0 ("freedreno/ir3: array rework") the src # (n) shifted by
one, but missed updating delay-slot calc.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sat, 16 Jan 2016 00:45:51 +0000 (19:45 -0500)]
freedreno/ir3: better array register allocation
Detect arrays which don't conflict with each other and allow overlapping
register allocation.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Fri, 15 Jan 2016 23:22:40 +0000 (18:22 -0500)]
freedreno/ir3: array offset can be negative
It at least happens with some piglit tests, like
$piglit/bin/vp-address-01
VERT
DCL IN[0]
DCL IN[1]
DCL OUT[0], POSITION
DCL OUT[1], COLOR
DCL CONST[0..7]
DCL ADDR[0]
0: ARL ADDR[0].x, IN[1].xxxx
1: MOV_SAT OUT[1], CONST[ADDR[0].x-1]
2: DP4 OUT[0].x, CONST[4], IN[0]
3: DP4 OUT[0].y, CONST[5], IN[0]
4: DP4 OUT[0].z, CONST[6], IN[0]
5: DP4 OUT[0].w, CONST[7], IN[0]
6: END
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Fri, 15 Jan 2016 22:07:02 +0000 (17:07 -0500)]
freedreno/ir3: workaround bug/feature
Seems like in certain cases, we cannot use c<a0.x+0> as the third src to
cat3 instructions. This may be slightly conservative, we may only have
this restriction when the first src is also const.
This fixes, for example, +24/-0 of the variable-indexing piglit tests.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Tue, 12 Jan 2016 15:24:10 +0000 (10:24 -0500)]
ttn: use writemask for store_var
Only user is freedreno, and after array-rework it can cope. Avoids
generating loads for a store.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sun, 10 Jan 2016 19:10:08 +0000 (14:10 -0500)]
freedreno/ir3: array rework
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sun, 10 Jan 2016 16:03:46 +0000 (11:03 -0500)]
freedreno/ir3: refactor/simplify cp
If we handle separately the special case of eliminating output mov
(which includes keeps and various other cases where we don't have a
consuming instruction's src register to collapse things into), we
can simplify the logic.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Wed, 13 Jan 2016 18:07:51 +0000 (13:07 -0500)]
freedreno/ir3: fix incorrect decoding of mov instructions
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sat, 9 Jan 2016 19:46:36 +0000 (14:46 -0500)]
freedreno/ir3: remove unused tgsi tokens ptr
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Wed, 6 Jan 2016 19:08:34 +0000 (14:08 -0500)]
freedreno/ir3: bit of ra refactor
Shuffle things slightly, passing instr-data to ra_name() to reduce the
number of places where we need to add support for array names.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Wed, 6 Jan 2016 18:32:24 +0000 (13:32 -0500)]
freedreno/ir3: cosmetic de-indent
Collapse two nested if's into one to reduce indent level.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Wed, 13 Jan 2016 23:39:56 +0000 (18:39 -0500)]
ttn: add missing writemask on store_output
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rob Clark [Fri, 15 Jan 2016 23:24:11 +0000 (18:24 -0500)]
nir/print: const_index is signed
Noticed this with $piglit/bin/vp-address-01
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Rob Clark [Mon, 4 Jan 2016 18:24:08 +0000 (13:24 -0500)]
nir: few missing struct names
nir.h is a bit inconsistent about 'typedef struct {} nir_foo' vs
'typedef struct nir_foo {} nir_foo'. But missing struct name tags is
inconvenient when you need a fwd declaration without pulling in all
of nir.
So add missing struct name tag for nir_variable, and a couple other
spots where it would likely be useful.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Ilia Mirkin [Fri, 15 Jan 2016 22:12:27 +0000 (17:12 -0500)]
nv50/ir: add saturate support on ex2
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Jeff Muizelaar [Sat, 16 Jan 2016 02:35:26 +0000 (03:35 +0100)]
gallivm: avoid crashing in mod by 0 with llvmpipe
This adds code that is basically the same as the code in umod, udiv and idiv.
However, unlike idiv we return -1.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Kenneth Graunke [Fri, 15 Jan 2016 07:27:03 +0000 (23:27 -0800)]
glsl: Allow implicit int -> uint conversions for bitwise operators (&, ^, |).
The ARB has decided that implicit conversions should be performed for
bitwise operators in future language revisions. Implementations of
current language revisions may or may not perform them.
This patch makes Mesa apply implicti conversions even on current
language versions. Applications appear to expect this behavior,
and there's really no downside to doing so.
Fixes shader compilation in Shadow of Mordor.
Bugzilla: https://www.khronos.org/bugzilla/show_bug.cgi?id=1405
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Fri, 15 Jan 2016 04:42:47 +0000 (20:42 -0800)]
i965/fs: Always set channel 2 of texture headers in some stages
In the vertex and fragment stages, the hardware is nice to us and leaves
g0.2 zerod out for us so we can use it for headers. However, in compute,
geometry, and tessellation stages, the hardware is not so nice. In
particular, for compute shaders on BDW, the hardware places some debug bits
in 23:15. As it happens, bit 15 is interpreted by the sampler as the alpha
channel mask. This means that if you use a texturing instruction with a
header in a compute shader, you may randomly get the alpha channel
disabled. Since channel masks affect the return length of the sampler
message, this can lead the GPU to expect a different mlen to the one you
specified in the shader and this, in turn, hangs your GPU.
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Fri, 15 Jan 2016 04:27:51 +0000 (20:27 -0800)]
i965/fs/generator: Take an actual shader stage rather than a string
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Thu, 14 Jan 2016 20:08:57 +0000 (12:08 -0800)]
i965/vec4: Use UW type for multiply into accumulator on GEN8+
BDW adds the following restriction: "When multiplying DW x DW, the dst
cannot be accumulator."
Cc: "11.1,11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Roland Scheidegger [Fri, 15 Jan 2016 02:10:40 +0000 (03:10 +0100)]
llvmpipe: ditch additional ref counting for vertex/geometry sampler views
The cleaning up was quite a performance hog (making pipe_resource_reference
the number two in profilers on the vertex path, and 3rd overall, with its
cousin pipe_reference_described not far behind) if there were lots
of tiny draw calls (ipers). Now the reason was really that it was blindly
calling this for all potential shader views (so 32 each for vs and gs) even
though the app never touched a single one which could have been fixed,
however I can't come up with a good reason why we refcount these. We've got
references, of course, in the sampler views, which should be quite sufficient
as we do all vertex and geometry shader execution fully synchronous.
(Calling prepare_shader_sampling for all draw calls even if there were no
changes looks quite suboptimal too, but generally we don't really expect vs/gs
shader sampling to be used much with llvmpipe, and there's even an early exit
if there aren't any views to avoid the "null loop" albeit it's now no longer
always trying to loop through all 32 slots. Maybe improve another time...).
Of course, if we manage to make vertex loads run asynchronously some day,
we need references again, but adding that back would be the least of the
problems...
Also only set LP_NEW_SAMPLER_VIEW for fragment sampler views. Nothing on the
vertex side depends on it (I suppose we'd really wanted a separate flag in
any case).
(Good for a 3% improvement or so in ipers under the right conditions.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Fri, 15 Jan 2016 19:12:24 +0000 (20:12 +0100)]
llvmpipe: fix "leaking" textures
This was not really a leak per se, but we were referencing the textures for
longer than intended. If textures were set via llvmpipe_set_sampler_views()
(for fs) and then picked up by lp_setup_set_fragment_sampler_views(), they
were referenced in the setup state. However, the only way to unreference them
was by replacing them with another texture, and not when the texture slot
was replaced with a NULL sampler view. (They were then further also referenced
by the scene too which might have additional minor side effects as we limit
the memory size which is allowed to be referenced by a scene in a rather crude
way.) Only setup destruction (at context destruction time) then finally would
get rid of the references.
Fix this by noting the number of textures the last time, and unreference
things if the new view is NULL (avoiding having to unreference things
always up to PIPE_MAX_SHADER_SAMPLER_VIEWS which would also have worked).
Found by code inspection, no test...
v2: rename var
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Samuel Iglesias Gonsálvez [Tue, 12 Jan 2016 14:36:56 +0000 (15:36 +0100)]
glsl: restrict consumer stage condition to modify interpolation type
Only modify interpolation type for integer-based varyings or when the
consumer is known and different than fragment shader.
If we are linking separate shader programs and the consumer is unknown,
the consumer could be added later and be a fragment shader. If we
modify the interpolation type in this case, we could read wrong
values in the fragment shader inputs, as shown in bug 93320.
Fixes the following CTS test:
ES31-CTS.vertex_attrib_binding.advanced-bindingUpdate
Fixes the following dEQP tests:
dEQP-GLES31.functional.separate_shader.random.102
dEQP-GLES31.functional.separate_shader.random.111
dEQP-GLES31.functional.separate_shader.random.115
dEQP-GLES31.functional.separate_shader.random.17
dEQP-GLES31.functional.separate_shader.random.22
dEQP-GLES31.functional.separate_shader.random.23
dEQP-GLES31.functional.separate_shader.random.3
dEQP-GLES31.functional.separate_shader.random.32
dEQP-GLES31.functional.separate_shader.random.39
dEQP-GLES31.functional.separate_shader.random.64
dEQP-GLES31.functional.separate_shader.random.73
dEQP-GLES31.functional.separate_shader.random.91
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93320
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Kenneth Graunke [Wed, 13 Jan 2016 23:07:18 +0000 (15:07 -0800)]
i965: Apply add_const_offset_to_base for vec4 VS inputs too.
This shouldn't hurt anything, and I'm about to introduce a pass that
will want it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Wed, 13 Jan 2016 23:04:39 +0000 (15:04 -0800)]
i965: Make add_const_offset_to_base() work at the shader level.
This makes it a pass, hiding the parameter structs and block callbacks
so it's simpler to work with.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Wed, 13 Jan 2016 23:23:48 +0000 (15:23 -0800)]
i965: Make an is_scalar boolean in brw_compile_vs().
Shorter than compiler->scalar_stage[MESA_SHADER_VERTEX], which can
help with line-wrapping.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Wed, 13 Jan 2016 21:32:44 +0000 (13:32 -0800)]
nir/builder: Add a nir_build_ivec4() convenience helper.
nir_build_ivec4 is more readable and succinct than using nir_build_imm
directly, even if you have C99.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tapani Pälli [Thu, 14 Jan 2016 12:10:59 +0000 (14:10 +0200)]
glsl: mark explicit uniforms as explicit in other stages too
If shader declares uniform explicit location in one stage but
implicit in another, explicit location should be used. Patch marks
implicit uniforms as explicit if they were explicit in previous stage.
This makes sure that we don't treat them implicit later when assigning
locations.
Fixes following CTS test:
ES31-CTS.explicit_uniform_location.uniform-loc-implicit-in-some-stages3
v2: move check to cross_validate_globals (Timothy)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Francisco Jerez [Sun, 3 Jan 2016 18:06:52 +0000 (10:06 -0800)]
i965/gen7.5+: Disable resource streamer during GPGPU workloads.
The RS and hardware binding tables are only supported on the 3D
pipeline and can lead to corruption if left enabled during a GPGPU
workload. Disable it when switching to the GPGPU (or media) pipeline
and re-enable it when switching back to the 3D pipeline.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Francisco Jerez [Sun, 3 Jan 2016 03:06:48 +0000 (19:06 -0800)]
i965/gen7: Emit stall and dummy primitive draw after switching to the 3D pipeline.
This hardware bug can supposedly lead to a hang on IVB and VLV.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Sun, 3 Jan 2016 03:05:48 +0000 (19:05 -0800)]
i965/gen4-5: Emit MI_FLUSH as required prior to switching pipelines.
AFAIK brw_emit_select_pipeline() is only called once during context
init on Gen4-5, at which point the pipeline is likely to be already
idle so it may just happen to work by luck regardless of the MI_FLUSH.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Sun, 3 Jan 2016 03:02:09 +0000 (19:02 -0800)]
i965/gen6-7: Implement stall and flushes required prior to switching pipelines.
Switching the current pipeline while it's not completely idle or the
read and write caches aren't flushed can lead to corruption. Fixes
misrendering of at least the following Khronos CTS test:
ES31-CTS.shader_image_load_store.basic-allTargets-store-fs
The stall and flushes are no longer required on Gen8+.
v2: Emit PIPE_CONTROL with non-zero post-sync op before the write
cache flush on SNB due to hardware bug. (Ken)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93323
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Sun, 3 Jan 2016 02:35:28 +0000 (18:35 -0800)]
i965/gen8+: Invalidate color calc state when switching to the GPGPU pipeline.
This hardware bug can cause a hang on context restore while the
current pipeline is set to GPGPU (BDWGFX HSD
1909593). In addition to
clearing the valid bit, mark the CC state as dirty to make sure that
the CC indirect state pointer is re-emitted when we switch back to the
3D pipeline.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Wed, 23 Dec 2015 16:44:59 +0000 (18:44 +0200)]
i965: Add state bit to trigger re-emission of color calculator state.
This will be used on Gen8+ to make sure that the color calculator
state pointers are re-emitted when switching back to the 3D pipeline
after some GPGPU workload due to a hardware workaround. There are
other state bits already defined that could be used to achieve the
same effect but they all cause a ton of unrelated state to be
re-emitted (e.g. BRW_NEW_STATE_BASE_ADDRESS), so just define a new
one, state bits are cheap.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ilia Mirkin [Thu, 14 Jan 2016 06:09:25 +0000 (01:09 -0500)]
nv50/ir: rebase indirect temp arrays to 0, so that we use less lmem space
Reduces local memory usage in a lot of Metro 2033 Redux and a few KSP
shaders:
total local used in shared programs : 54116 -> 30372 (-43.88%)
Probably modest advantage to execution, but it's an imporant
prerequisite to dropping some of the TGSI optimizations done by the
state tracker.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Tue, 12 Jan 2016 19:41:52 +0000 (14:41 -0500)]
nv50/ir: only use FILE_LOCAL_MEMORY for temp arrays that use indirection
Previously we were treating any indirect temp array usage to mean that
everything should end up in lmem. The MemoryOpt pass would clean a lot
of that up later, but in the meanwhile we would lose a lot of
opportunity for optimization.
This helps a lot of Metro 2033 Redux and a handful of KSP shaders:
total instructions in shared programs :
6288373 ->
6261517 (-0.43%)
total gprs used in shared programs : 944051 -> 945131 (0.11%)
total local used in shared programs : 54116 -> 54116 (0.00%)
A typical case is for register usage to double and for instructions to
halve. A future commit can also optimize local memory usage size to be
reduced with better packing.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Mon, 11 Jan 2016 21:41:18 +0000 (16:41 -0500)]
nvc0/ir: be careful about propagating very large offsets into const load
Indirect constbuf indexing works by using very large offsets. However if
an indirect constbuf index load is const-propagated, it becomes a very
large const offset. Take that into account when legalizing the SSA by
moving the high parts of that offset into the file index. Also disallow
very large (or small) indices on most other instructions.
This fixes regressions in ubo_array_indexing/*-two-arrays piglit tests.
Fixes: abd326e81b (nv50/ir: propagate indirect loads into instructions)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Mon, 11 Jan 2016 21:39:15 +0000 (16:39 -0500)]
nvc0: allow fragment shader inputs to use indirect indexing
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Thu, 14 Jan 2016 18:44:54 +0000 (13:44 -0500)]
st/mesa: use surface format to generate mipmaps when available
This fixes the recently posted mipmap + texture views piglit test.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Marek Olšák [Wed, 13 Jan 2016 17:42:02 +0000 (18:42 +0100)]
radeonsi: don't miss changes to SPI_TMPRING_SIZE
I'm not sure about the consequences of this bug, but it's definitely
dangerous.
This applies to SI, CIK, VI.
Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Charmaine Lee [Tue, 22 Dec 2015 19:20:41 +0000 (11:20 -0800)]
svga: add DXGenMips command support
For those formats that support hw mipmap generation, use the
DXGenMips command. Otherwise fallback to the mipmap generation utility.
Tested with piglit, OpenGL apps (Heaven, Turbine, Cinebench)
v2: make sure the texture surface was created with the render target bind flag
set relocation flag to SVGA_RELOC_WRITE for the texture surface
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Charmaine Lee [Mon, 21 Dec 2015 19:07:08 +0000 (11:07 -0800)]
svga: add num-generate-mipmap HUD query
The actual increment of the num-generate-mipmap counter will be done
in a subsequent patch when hw generate mipmap is supported.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Charmaine Lee [Thu, 14 Jan 2016 17:22:17 +0000 (10:22 -0700)]
gallium/st: add pipe_context::generate_mipmap()
This patch adds a new interface to support hardware mipmap generation.
PIPE_CAP_GENERATE_MIPMAP is added to allow a driver to specify
if this new interface is supported; if not supported, the state tracker will
fallback to mipmap generation by rendering/texturing.
v2: add PIPE_CAP_GENERATE_MIPMAP to the disabled section for all drivers
v3: add format to the generate_mipmap interface to allow mipmap generation
using a format other than the resource format
v4: fix return type of trace_context_generate_mipmap()
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Thu, 14 Jan 2016 17:38:17 +0000 (10:38 -0700)]
st/mesa: declare struct pipe_screen in st_cb_bufferobjects.h
To silence a compiler warning. Trivial.
Matt Turner [Wed, 13 Jan 2016 19:09:11 +0000 (11:09 -0800)]
nir: Lower bitfield_extract.
The OpenGL specifications for bitfieldExtract() says:
The result will be undefined if <offset> or <bits> is negative, or if
the sum of <offset> and <bits> is greater than the number of bits
used to store the operand.
Therefore passing bits=32, offset=0 is legal and defined in GLSL.
But the earlier SM5 ubfe/ibfe opcodes are specified to accept a bitfield width
ranging from 0-31. As such, Intel and AMD instructions read only the low 5 bits
of the width operand, making them not able to implement the GLSL-specified
behavior directly.
This commit adds ubfe/ibfe operations from SM5 and a lowering pass for
bitfield_extract to to handle the trivial case of <bits> = 32 as
bitfieldExtract:
bits > 31 ? value : bfe(value, offset, bits)
Fixes:
ES31-CTS.shader_bitfield_operation.bitfieldExtract.uvec3_0
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>
Matt Turner [Wed, 13 Jan 2016 19:08:37 +0000 (11:08 -0800)]
nir: Handle <bits>=32 case in bitfield_insert lowering.
The OpenGL specifications for bitfieldInsert() says:
The result will be undefined if <offset> or <bits> is negative, or if
the sum of <offset> and <bits> is greater than the number of bits
used to store the operand.
Therefore passing bits=32, offset=0 is legal and defined in GLSL.
But the earlier SM5 bfi opcode is specified to accept a bitfield width
ranging from 0-31. As such, Intel and AMD instructions read only the low
5 bits of the width operand, making them not able to implement the
GLSL-specified behavior directly.
This commit fixes the lowering of bitfield_insert to handle the trivial
case of <bits> = 32 as
bitfieldInsert:
bits > 31 ? insert : bfi(bfm(bits, offset), insert, base)
Fixes:
ES31-CTS.shader_bitfield_operation.bitfieldInsert.uint_2
ES31-CTS.shader_bitfield_operation.bitfieldInsert.uvec4_3
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>
Brian Paul [Wed, 13 Jan 2016 23:20:09 +0000 (16:20 -0700)]
st/mesa: add check for color logicop in blit_copy_pixels()
We check that a bunch of raster operations are disabled in
blit_copy_pixels(). We also need to check that color logicop is
disabled.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Tue, 12 Jan 2016 14:29:18 +0000 (09:29 -0500)]
gallium/radeon: do not reallocate user memory buffers
The whole point of AMD_pinned_memory is that applications don't have to map
buffers via OpenGL - but they're still allowed to, so make sure we don't break
the link between buffer object and user memory unless explicitly instructed
to.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Thu, 14 Jan 2016 14:41:04 +0000 (09:41 -0500)]
gallium/radeon: implement PIPE_CAP_INVALIDATE_BUFFER
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Mon, 11 Jan 2016 00:54:44 +0000 (19:54 -0500)]
gallium/radeon: reset valid_buffer_range on PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE
This accomodates a streaming pattern where the discard flag is set when the
application wraps back to the beginning of the buffer.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sat, 9 Jan 2016 23:05:58 +0000 (18:05 -0500)]
st/mesa: implement Driver.InvalidateBufferSubData
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Mon, 11 Jan 2016 22:44:45 +0000 (17:44 -0500)]
st/mesa: use pipe->invalidate_resource instead of buffer re-allocation
Drivers are expected to avoid unnecessary work when possible in this code
path.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Mon, 11 Jan 2016 22:38:08 +0000 (17:38 -0500)]
gallium: add PIPE_CAP_INVALIDATE_BUFFER
It makes sense to re-use pipe->invalidate_resource for the purpose of
glInvalidateBufferData, but this function is already implemented in vc4
where it doesn't have the expected behavior. So add a capability flag
to indicate that the driver supports the expected behavior.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sat, 9 Jan 2016 22:53:07 +0000 (17:53 -0500)]
mesa: add Driver.InvalidateBufferSubData
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Nicolai Hähnle [Sat, 9 Jan 2016 22:51:39 +0000 (17:51 -0500)]
mesa: fix the checks in _mesa_InvalidateBuffer(Sub)Data
Change the check to be in line with what the quoted spec fragment says.
I have sent out a piglit test for this as well.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Nicolai Hähnle [Tue, 12 Jan 2016 17:28:11 +0000 (12:28 -0500)]
winsys/radeon: fix warnings about incompatible pointer types
Some confusion between pb_buffer and radeon_bo as well as between
radeon_drm_winsys and radeon_winsys.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Neil Roberts [Wed, 13 Jan 2016 19:28:45 +0000 (19:28 +0000)]
texobj: Check completeness with InternalFormat rather than Mesa format
The internal Mesa format used for a texture might not match the one
requested in the internalFormat when the texture was created, for
example if the driver is internally remapping RGB textures to RGBA.
Otherwise it can cause false positives for completeness if one mipmap
image is created as RGBA and the other as RGB because they would both
have an RGBA Mesa format. If we check the InternalFormat instead then
we are directly checking the API usage which I think better matches
the intention of the check.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93700
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Ben Widawsky [Wed, 30 Dec 2015 17:47:17 +0000 (09:47 -0800)]
i965: Remove unused hw_must_use_separate_stencil
I spotted this while looking for what needs updating in future platforms.
I'm too lazy to go through the git logs, but it was probably missed by Jason
when all the brw refactoring happened.
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Thu, 14 Jan 2016 00:17:26 +0000 (16:17 -0800)]
i965: Drop extra newline from shader compile messages.
Ilia changed shader-db's run.c to not expect messages to contain a
newline in shader-db commit
51bbc8035.
Matt Turner [Fri, 8 Jan 2016 00:16:35 +0000 (16:16 -0800)]
nir: Change bfm's semantics to match Intel/AMD/SM5.
Intel/AMD's hardware instructions do not handle arguments of 32.
Constant evaluation should not produce a result different from the
hardware instruction.
The s/1ull/1u/ change is intentional: previously we wanted defined
behavior for the "1 << 32" case, but we're making this case undefined so
we can make it 1u and save ourselves a 64-bit operation.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Matt Turner [Wed, 30 Dec 2015 19:48:22 +0000 (14:48 -0500)]
glsl: Fix undefined shifts.
Shifting into the sign bit is undefined, as is shifting by 32.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Matt Turner [Mon, 11 Jan 2016 18:54:19 +0000 (10:54 -0800)]
glsl: Handle failure of Python codegen scripts.
If a Python codegen script failed, it would write a zero-byte file,
which on subsequent invocations of make would trick it into thinking the
file was appropriately generated.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Kenneth Graunke [Fri, 8 Jan 2016 00:01:51 +0000 (16:01 -0800)]
glsl, nir: Make ir_triop_bitfield_extract a vectorized operation.
We would like to be able to combine
result.x = bitfieldExtract(src0.x, src1.x, src2.x);
result.y = bitfieldExtract(src0.y, src1.y, src2.y);
result.z = bitfieldExtract(src0.z, src1.z, src2.z);
result.w = bitfieldExtract(src0.w, src1.w, src2.w);
into a single ivec4 bitfieldInsert operation. This should be possible
with most drivers.
This patch changes the offset and bits parameters from scalar ints
to ivecN or uvecN. The type of all three operands will be the same,
for simplicity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Kenneth Graunke [Tue, 5 Jan 2016 12:01:11 +0000 (04:01 -0800)]
glsl, nir: Make ir_quadop_bitfield_insert a vectorized operation.
We would like to be able to combine
result.x = bitfieldInsert(src0.x, src1.x, src2.x, src3.x);
result.y = bitfieldInsert(src0.y, src1.y, src2.y, src3.y);
result.z = bitfieldInsert(src0.z, src1.z, src2.z, src3.z);
result.w = bitfieldInsert(src0.w, src1.w, src2.w, src3.w);
into a single ivec4 bitfieldInsert operation. This should be possible
with most drivers.
This patch changes the offset and bits parameters from scalar ints
to ivecN or uvecN. The type of all four operands will be the same,
for simplicity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Kenneth Graunke [Thu, 7 Jan 2016 23:54:16 +0000 (15:54 -0800)]
glsl: Delete the ir_binop_bfm and ir_triop_bfi opcodes.
TGSI doesn't use these - it just translates ir_quadop_bitfield_insert
directly. NIR can handle ir_quadop_bitfield_insert as well.
These opcodes were only used for i965, and with Jason's recent patches,
we can do this lowering in NIR (which also gains us SPIR-V handling).
So there's not much point to retaining this GLSL IR lowering code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Matt Turner [Mon, 11 Jan 2016 20:13:24 +0000 (12:13 -0800)]
nir: Fix constant evaluation of bfm.
NIR's bfm, like Intel/AMD's hardware instructions and GLSL IR's
ir_binop_bfm takes <bits> as src0 and <offset> as src1.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Matt Turner [Mon, 11 Jan 2016 17:34:50 +0000 (09:34 -0800)]
i965/fs: Skip assertion on NaN.
A shader in Unreal4 uses the result of divide by zero in its color
output, producing NaN and triggering this assertion since NaN is not
equal to itself.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93560
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Matt Turner [Fri, 27 Feb 2015 00:06:45 +0000 (16:06 -0800)]
i965/fs: Add debugging to constant combining pass.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Brian Paul [Tue, 12 Jan 2016 15:46:40 +0000 (08:46 -0700)]
meta: remove const qualifier on _mesa_meta_fb_tex_blit_begin()
To silence a compiler warning about a const/non-const mismatch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Brian Paul [Tue, 12 Jan 2016 01:22:50 +0000 (18:22 -0700)]
st/mesa: fix incorrect buffer token passed to _mesa_BindFramebuffer()
I added this code right at the end, and got it wrong.
Only used by the WGL_ARB_render_texture code.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Emil Velikov [Wed, 13 Jan 2016 13:27:50 +0000 (15:27 +0200)]
docs: add news item and link release notes for 11.1.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Emil Velikov [Wed, 13 Jan 2016 13:23:53 +0000 (15:23 +0200)]
docs: add sha256 checksums for 11.1.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit
4b2d9f29e9b75cbbeb76ccf753a256e11f07ee1a)
Emil Velikov [Wed, 13 Jan 2016 10:11:33 +0000 (12:11 +0200)]
docs: add release notes for 11.1.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit
330aa44a0da7548000a6b2fc2bb580e9c8e733cc)
Neil Roberts [Thu, 19 Nov 2015 15:25:21 +0000 (16:25 +0100)]
i965/gen9: Don't allow the RGBX formats for texturing/rendering
The RGBX surface formats aren't renderable so we internally remap them
to RGBA when rendering. They are retained as RGBX when used as
textures. However since the previous patch fast clears are disabled
for surfaces that use a different format for rendering than for
texturing. To avoid this situation we can just pretend not to support
RGBX formats at all. This will cause the upper layers of mesa to pick
an RGBA format internally instead. This should be safe because we
always override the alpha component to 1.0 for RGBX in the texture
swizzle anyway. We could also do this for all gens except that it's a
bit more difficult when the hardware doesn't support texture
swizzling. Gens using the blorp have further problems because that
doesn't implement this swizzle override.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Marek Olšák [Sat, 2 Jan 2016 22:09:58 +0000 (23:09 +0100)]
radeonsi: move POSITION and FACE fragment shader inputs to system values
And FACE becomes integer instead of float.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Marek Olšák [Thu, 7 Jan 2016 19:00:34 +0000 (20:00 +0100)]
radeonsi: simplify gl_FragCoord behavior
It will become a system value, not an input.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Samuel Iglesias Gonsálvez [Tue, 5 Jan 2016 12:21:17 +0000 (13:21 +0100)]
glsl: add image_format check in cross_validate_globals()
Fixes CTS test:
ES31-CTS.shader_image_load_store.negative-linkErrors
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93410
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Tapani Pälli [Mon, 4 Jan 2016 07:55:52 +0000 (09:55 +0200)]
mesa: do not validate io of non-compute and compute stage
Fixes regression on SSO tests that have both non-compute and
compute programs in a program pipeline.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93532
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Tapani Pälli [Tue, 12 Jan 2016 09:03:14 +0000 (11:03 +0200)]
glsl: add packed varyings for outputs with single stage program
Commit
8926dc8 added a check where we add packed varyings of output
stage only when we have multiple stages, however duplicates are already
handled by changes in commit
0508d950 and we want to add outputs also in
case where we have only one stage.
Fixes regression caused by
8926dc8 for following test:
ES31-CTS.program_interface_query.separate-programs-vertex
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Roland Scheidegger [Wed, 13 Jan 2016 03:48:41 +0000 (04:48 +0100)]
llvmpipe: (trivial) use cast wrapper for __m128d to __m128 casts
some compiler was unhappy.
Roland Scheidegger [Wed, 6 Jan 2016 00:05:35 +0000 (01:05 +0100)]
llvmpipe: avoid most 64 bit math in rasterization
The trick here is to recognize that in the c + n * dcdx calculations,
not only can the lower FIXED_ORDER bits not change (as the dcdx values
have those all zero) but that this means the sign bit of the calculations
cannot be different as well, that is
sign(c + n*dcdx) == sign((c >> FIXED_ORDER) + n*(dcdx >> FIXED_ORDER)).
That shaves off more than enough bits to never require 64bit masks.
A shifted plane c value could still easily exceed 32 bits, however since we
throw out planes which are trivial accept even before binning (and similarly
don't even get to see tris for which there was a trivial reject plane)) this
is never a problem.
The idea isnt't all that revolutionary, in fact something similar was tried
ages ago (
9773722c2b09d5f0615a47cecf4347859474dc56) back when the values were
only 32 bit anyway. I believe now it didn't quite work then because the
adjustment needed for testing trivial reject / partial masks wasn't handled
correctly.
This still keeps the separate 32/64 bit paths for now, as the 32 bit one still
looks minimally simpler (and also because if we'd pass in dcdx/dcdy/eo unscaled
from setup which would be a good reason to ditch the 32 bit path, we'd need to
change the special-purpose rasterization functions for small tris).
This passes piglit triangle-rasterization (-fbo -auto -max_size
-subpixelbits 8) and triangle-rasterization-overdraw (with some hacks
to make it work correctly with large sizes) easily (full piglit as
well of course, but most tests wouldn't use triangles large enough to
be affected, that is tris with a bounding box over 128x128).
The profiler says indeed time spent in rast_tri functions is reduced
substantially, BUT of course only if the tris are large. I measured a 3%
improvement in mesa gloss demo when supersized to twice the screen size...
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Sun, 3 Jan 2016 00:13:45 +0000 (01:13 +0100)]
llvmpipe: scale up bounding box planes to subpixel precision
Otherwise some planes we get in rasterization have subpixel precision, others
not. Doesn't matter so far, but will soon. (OpenGL actually supports viewports
with subpixel accuracy, so could even do bounding box calcs with that).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Sat, 2 Jan 2016 03:59:16 +0000 (04:59 +0100)]
llvmpipe: add sse code for fixed position calculation
This is quite a few less instructions, albeit still do the 2 64bit muls
with scalar c code (they'd need way more shuffles, plus fixup for the signed
mul so it totally doesn't seem worth it - x86 can do 32x32->64bit signed
scalar muls natively just fine after all (even on 32bit).
(This still doesn't have a very measurable performance impact in reality,
although profiler seems to say time spent in setup indeed has gone down by
10% or so overall. Maybe good for a 3% or so improvement in openarena.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>