Jason Ekstrand [Wed, 18 Mar 2015 19:34:09 +0000 (12:34 -0700)]
nir: Use a list instead of a hash_table for inputs, outputs, and uniforms
We never did a single hash table lookup in the entire NIR code base that I
found so there was no real benifit to doing it that way. I suppose that
for linking, we'll probably want to be able to lookup by name but we can
leave building that hash table to the linker. In the mean time this was
causing problems with GLSL IR -> NIR because GLSL IR doesn't guarantee us
unique names of uniforms, etc. This was causing massive rendering isues in
the unreal4 Sun Temple demo.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Brian Paul [Thu, 19 Mar 2015 14:48:56 +0000 (08:48 -0600)]
gallivm: remove unused 'builder' variable
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Thu, 19 Mar 2015 14:10:19 +0000 (08:10 -0600)]
mesa: use more descriptive error messages for glUniform errors
Different errors for type mismatches, size mismatches and matrix/
non-matrix mismatches. Use a common format of "uniformName"@location
in the messags.
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Matt Turner [Mon, 16 Mar 2015 19:18:31 +0000 (12:18 -0700)]
i965/fs: Print spills:fills and number of promoted constants.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Ian Romanick [Sat, 24 Jan 2015 01:31:12 +0000 (17:31 -0800)]
i965/fs: Emit better b2f of an expression on GEN4 and GEN5
On platforms that do not natively generate 0u and ~0u for Boolean
results, b2f expressions that look like
f = b2f(expr cmp 0)
will generate better code by pretending the expression is
f = ir_triop_sel(0.0, 1.0, expr cmp 0)
This is because the last instruction of "expr" can generate the
condition code for the "cmp 0". This avoids having to do the "-(b & 1)"
trick to generate 0u or ~0u for the Boolean result. This means code like
mov(16) g16<1>F 1F
mul.ge.f0(16) null g6<8,8,1>F g14<8,8,1>F
(+f0) sel(16) m6<1>F g16<8,8,1>F 0F
will be generated instead of
mul(16) g2<1>F g12<8,8,1>F g4<8,8,1>F
cmp.ge.f0(16) g2<1>D g4<8,8,1>F 0F
and(16) g4<1>D g2<8,8,1>D 1D
and(16) m6<1>D -g4<8,8,1>D 0x3f800000UD
v2: When the comparison is either == 0.0 or != 0.0 use the knowledge
that the true (or false) case already results in zero would allow better
code generation by possibly avoiding a load-immediate instruction.
v3: Apply the optimization even when neither comparitor is zero.
Shader-db results:
GM45 (0x2A42):
total instructions in shared programs:
3551002 ->
3550829 (-0.00%)
instructions in affected programs: 33269 -> 33096 (-0.52%)
helped: 121
Iron Lake (0x0046):
total instructions in shared programs:
4993327 ->
4993146 (-0.00%)
instructions in affected programs: 34199 -> 34018 (-0.53%)
helped: 129
No change on other platforms.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Palli <tapani.palli@intel.com>
Matt Turner [Wed, 18 Mar 2015 21:23:41 +0000 (14:23 -0700)]
util: Optimize _mesa_roundeven with SSE 4.1.
The SSE 4.1 ROUND instructions let us implement roundeven directly.
Otherwise we assume that the rounding mode has not been modified (as we
do in the rest of Mesa) and use rint().
glibc uses the ROUND instruction in rint() after a cpuid check. This
patch just lets us inline it directly when we're already building for
SSE 4.1.
Reviewed-by: Carl Worth <cworth@cworth.org>
Matt Turner [Thu, 12 Mar 2015 18:34:05 +0000 (11:34 -0700)]
util: Add a roundeven test.
Reviewed-by: Carl Worth <cworth@cworth.org>
Matt Turner [Wed, 11 Mar 2015 00:55:21 +0000 (17:55 -0700)]
mesa: Replace _mesa_round_to_even() with _mesa_roundeven().
Eric's initial patch adding constant expression evaluation for
ir_unop_round_even used nearbyint. The open-coded _mesa_round_to_even
implementation came about without much explanation after a reviewer
asked whether nearbyint depended on the application not modifying the
rounding mode. Of course (as Eric commented) we rely on the application
not changing the rounding mode from its default (round-to-nearest) in
many other places, including the IROUND function used by
_mesa_round_to_even!
Worse, IROUND() is implemented using the trunc(x + 0.5) trick which
fails for x = nextafterf(0.5, 0.0).
Still worse, _mesa_round_to_even unexpectedly returns an int. I suspect
that could cause problems when rounding large integral values not
representable as an int in ir_constant_expression.cpp's
ir_unop_round_even evaluation. Its use of _mesa_round_to_even is clearly
broken for doubles (as noted during review).
The constant expression evaluation code for the packing built-in
functions also mistakenly assumed that _mesa_round_to_even returned a
float, as can be seen by the cast through a signed integer type to an
unsigned (since negative float -> unsigned conversions are undefined).
rint() and nearbyint() implement the round-half-to-even behavior we want
when the rounding mode is set to the default round-to-nearest. The only
difference between them is that nearbyint() raises the inexact
exception.
This patch implements _mesa_roundeven{f,}, a function similar to the
roundeven function added by a yet unimplemented technical specification
(ISO/IEC TS 18661-1:2014), with a small difference in behavior -- we
don't bother raising the inexact exception, which I don't think we care
about anyway.
At least recent Intel CPUs can quickly change a subset of the bits in
the x87 floating-point control register, but the exception mask bits are
not included. rint() does not need to change these bits, but nearbyint()
does (twice: save old, set new, and restore old) in order to raise the
inexact exception, which would incur some penalty.
Reviewed-by: Carl Worth <cworth@cworth.org>
Matt Turner [Wed, 18 Mar 2015 02:17:15 +0000 (19:17 -0700)]
i965/fs: Ignore type in cmod prop if scan_inst is CMP.
total instructions in shared programs:
6263270 ->
6203091 (-0.96%)
instructions in affected programs:
2606529 ->
2546350 (-2.31%)
helped: 14301
GAINED: 5
LOST: 3
Revewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Jason Ekstrand [Tue, 17 Mar 2015 19:10:58 +0000 (12:10 -0700)]
i965/nir: Make our environment variable checking smarter
Before, we enabled NIR if you set INTEL_USE_NIR to anything which mean that
INTEL_USE_NIR=false would actually turn on NIR. In preparation for turning
NIR on by default, this commit makes it smarter by allowing the
INTEL_USE_NIR variable to work as either a force-enable or a force-disable.
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Dave Airlie [Mon, 16 Mar 2015 05:21:55 +0000 (15:21 +1000)]
egl: don't fill client apis string forever.
We never reset the string on eglTerminate, so it grows
for ever on multiple eglInitialise.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Jose Fonseca [Wed, 18 Mar 2015 14:25:19 +0000 (14:25 +0000)]
swrast: Use BITFIELD64_BIT for arrayAttribs.
As VARYING_SLOT_MAX can be bigger than 32.
I'll probably stop building swrast with MSVC in the near future, but this
seems a real bug regardless.
Reviewed-by: Brian Paul <brianp@vmware.com>
Jose Fonseca [Wed, 18 Mar 2015 14:22:41 +0000 (14:22 +0000)]
scons: Don't link program_lexer.l/y twice.
program/lex.yy.c and program/program_parse.tab.c is already included in
the PROGRAM_FILES variable.
We still need to specify the dependency relationship though.
Reviewed-by: Brian Paul <brianp@vmware.com>
Jose Fonseca [Wed, 18 Mar 2015 14:19:10 +0000 (14:19 +0000)]
gallivm: Use INFINITY directly.
Already done below.
Reviewed-by: Brian Paul <brianp@vmware.com>
Jose Fonseca [Wed, 18 Mar 2015 14:18:28 +0000 (14:18 +0000)]
scons: Silence MSVC warnings about overflows in constant arithmetic.
These get triggered even when using the standard C99 INFINITY/NAN
constants.
Reviewed-by: Brian Paul <brianp@vmware.com>
José Fonseca [Tue, 25 Nov 2014 22:27:04 +0000 (22:27 +0000)]
scons: Disable MSVC signed/unsigned mismatch warnings.
By default gcc ignores the issue, and as result code that mixes
signed/unsigned is so widespread through the code base that it ends up
being little more than noise, potentially obscuring more pertinent
warnings.
Maybe one day we enable the corresponding gcc warnings and cleanup, but
until then, this change disables them.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Laura Ekstrand [Wed, 18 Mar 2015 20:26:31 +0000 (13:26 -0700)]
docs: Update progress on ARB_direct_state_access.
Acked-by: Matt Turner <mattst88@gmail.com>
Brian Paul [Wed, 18 Mar 2015 18:25:03 +0000 (12:25 -0600)]
dri: add _glapi_set_nop_handler(), _glapi_new_nop_table() to dri_test.c
I wasn't aware of these _glapi_ stub functions when I committed
4bdbb588a9d385509f9168e38bfdb76952ba469c. Fixes "make check"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89662
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Brian Paul [Tue, 17 Mar 2015 17:57:34 +0000 (11:57 -0600)]
mesa: remove MSVC warning pragmas
Removing this block of pragmas doesn't seem to increase the number of
warning generated by MSVC. Other than signed/unsigned comparison warnings
there's very few other warnings nowadays.
Acked-by: Matt Turner <mattst88@gmail.com>
Brian Paul [Tue, 17 Mar 2015 17:50:35 +0000 (11:50 -0600)]
mesa: add void to format_array_format_table_init() declaration
Silences an MSVC warning where it's called from call_once().
Reviewed-by: Matt Turner <mattst88@gmail.com>
Brian Paul [Fri, 13 Mar 2015 19:12:12 +0000 (13:12 -0600)]
mapi: move some #includes from .h file to .c files
Just include things where they're needed.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Fri, 13 Mar 2015 18:05:01 +0000 (12:05 -0600)]
mesa: make _mesa_alloc_dispatch_table() static
Never called from outside of context.c
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Fri, 13 Mar 2015 17:43:44 +0000 (11:43 -0600)]
mesa: reimplement dispatch table no-op function handling
Use the new _glapi_new_nop_table() and _glapi_set_nop_handler() to
improve how we handle calling no-op GL functions.
If there's a current context for the calling thread, generate a
GL_INVALID_OPERATION error. This will happen if the app calls an
unimplemented extension function or it calls an illegal function
between glBegin/glEnd.
If there's no current context, print an error to stdout if it's a debug
build.
The dispatch_sanity.cpp file has some previous checks removed since
the _mesa_generic_nop() function no longer exists.
This fixes the piglit gl-1.0-dlist-begin-end and gl-1.0-beginend-coverage
tests on Windows.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Fri, 13 Mar 2015 16:20:29 +0000 (10:20 -0600)]
mapi: add new _glapi_new_nop_table() and _glapi_set_nop_handler()
_glapi_new_nop_table() creates a new dispatch table populated with
pointers to no-op functions.
_glapi_set_nop_handler() is used to register a callback function which
will be called from each of the no-op functions.
Now we always generate a separate no-op function for each GL entrypoint.
This allows us to do proper stack clean-up for Windows __stdcall and
lets us report the actual function name in error messages. Before this
change, for non-Windows release builds we used a single no-op function
for all entrypoints.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Rob Clark [Wed, 18 Mar 2015 13:51:57 +0000 (09:51 -0400)]
freedreno/ir3: fix infinite recursion in sched
One more case we need to handle. One of the src instructions for the
indirect could also end up being ourself.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Wed, 18 Mar 2015 13:51:27 +0000 (09:51 -0400)]
freedreno: fix spelling
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Marek Olšák [Wed, 18 Mar 2015 00:50:03 +0000 (01:50 +0100)]
docs/GL3: don't list nv30
Suggested by Ilia Mirkin.
Marek Olšák [Mon, 16 Mar 2015 22:19:17 +0000 (23:19 +0100)]
docs/GL3: don't list swrast
Let's face it: This driver is unlikely to get more love.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Marek Olšák [Mon, 16 Mar 2015 22:15:22 +0000 (23:15 +0100)]
docs/GL3: don't list r300
r300g already supports everything it can. There's no point in listing
the driver here.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Marek Olšák [Tue, 17 Mar 2015 16:47:17 +0000 (17:47 +0100)]
radeonsi: increase coords array size for radeon_llvm_emit_prepare_cube_coords
radeon_llvm_emit_prepare_cube_coords uses coords[4] in some cases (TXB2 etc.)
Discovered by Coverity. Reported by Ilia Mirkin.
Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Jonathan Gray [Tue, 17 Mar 2015 02:16:05 +0000 (13:16 +1100)]
configure: check if compiler supports -Werror=vla.
Check if the compiler supports -Werror=vla before using it.
-Wvla was introduced with GCC 4.3 and is not present in 4.2.
Fixes the build on OpenBSD.
v2: Fix statement order, and quote $save_CFLAGS.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89433
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Signed-off-by: Jose Fonseca <jfonseca@vmware.com>
Chris Wilson [Wed, 11 Mar 2015 12:21:29 +0000 (12:21 +0000)]
i965: Defer the throttle until we submit new commands
Currently, we throttle before the user begins preparing commands for the
next frame when we acquire the draw/read buffers. However, construction
of the command buffer can itself take significant time relative to the
frame time. If we move the throttle from the buffer acquire to the
command submit phase we can allow the user to improve concurrency
between the CPU and GPU (i.e. reduce the amount of time we waste inside
the throttle).
v2: Whitespace + delay throttling until after the next submission for
greater parallelism
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Chad Versace <chad.versace@linux.intel.com>
Cc: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com> [v1]
Chris Wilson [Fri, 19 Sep 2014 09:10:13 +0000 (10:10 +0100)]
i965: Throttle to the previous frame
In order to facilitate the concurrency offered by triple buffering and to
offset the latency induced by swapping via an external process, which
may incur extra rendering itself, only throttle to the previous frame
and not the last. The second issue that mostly affects swap benchmarks,
but also can incur jitter in the throttling, is that the throttle bo is
closer to the next SwapBuffers rather than immediately after the previous
SwapBuffers. Throttling to the previous frame doubles the maximum possible
latency at the benefit of improving throughput and reducing jitter.
v2: Rename "first_post_swapbuffer" batches array to a plain
throttle_batch[] as the pluralisation was contorting the name and not
making it clear as to whether it was the first batch or first_post_swap
batch. Not least of which was that not all throttle points are SwapBuffers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Chad Versace <chad.versace@linux.intel.com>
Cc: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Chris Wilson [Thu, 26 Feb 2015 11:25:18 +0000 (11:25 +0000)]
i965: Throttle rendering to an fbo
When rendering to an fbo, even though it may be acting as a winsys
frontbuffer or just generally, we never throttle. However, when rendering
to an fbo, there is no natural frame boundary. Conventionally we use
SwapBuffers and glFinish, but potential callers avoid often glFinish for
being too heavy handed (waiting on all outstanding rendering to complete).
The kernel provides a soft-throttling option for this case that waits for
rendering older than 20ms to be complete (that's a little too lax to be
used for swapbuffers, but is here a useful safety net). The remaining
choice is then either never to throttle, throttle after every draw call,
or at after intermediate user defined point such as glFlush and thus all the
implied flushes. This patch opts for the latter as that is the current
method used for flushing to front buffers.
v2: Defer the throttling from inside the flush to the next
intel_prepare_render() and switch non-fbo frontbuffer throttling over to
use the same lax method. The issuing being that
glFlush()/intel_prepare_read() is just as likely to be called inside a
tight loop and not at "frame" boundaries.
v3: Rename from need_front_throttle to need_flush_throttle to avoid any
ambiguity between front buffer rendering and fbo rendering. (Chad)
v4: Whitespace
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Chad Versace <chad.versace@linux.intel.com>
Cc: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Jason Ekstrand [Mon, 16 Mar 2015 22:08:04 +0000 (15:08 -0700)]
nir/peephole_select: Allow uniform/input loads and load_const
Shader-db results on HSW:
total instructions in shared programs:
4174156 ->
4157291 (-0.40%)
instructions in affected programs: 145397 -> 128532 (-11.60%)
helped: 383
HURT: 0
GAINED: 20
LOST: 22
There are two more tests lost than gained. However, comparing this with
GLSL IR vs. NIR results, the overall delta is reduced from 85/44
gained/lost on current master to 71/32 with this commit. Therefore, I
think it's probably a boon since we are getting "closer" to where we were
before.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Mon, 16 Mar 2015 21:55:00 +0000 (14:55 -0700)]
nir/peephole_select: Copy instructions into the block before the if
Previously we tried to do poor-man's copy propagation as we created the
select instructions. Instead, this commit just moves the instructions from
the blocks inside the if into the block before. Copy propagation will take
care of making sure we don't have any extra mov's in there for us.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Mon, 16 Mar 2015 21:45:54 +0000 (14:45 -0700)]
nir/peephole_select: Rename are_all_move_to_phi and use a switch
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Mario Kleiner [Thu, 12 Mar 2015 19:34:06 +0000 (20:34 +0100)]
glx: Handle out-of-sequence swap completion events correctly. (v2)
The code for emitting INTEL_swap_events swap completion
events needs to translate from 32-Bit sbc on the wire to
64-Bit sbc for the events and handle wraparound accordingly.
It assumed that events would be sent by the server in the
order their corresponding swap requests were emitted from
the client, iow. sbc count should be always increasing. This
was correct for DRI2.
This is not always the case under the DRI3/Present backend,
where the Present extension can execute presents and send out
completion events in a different order than the submission
order of the present requests, due to client code specifying
targetMSC target vblank counts which are not strictly
monotonically increasing. This confused the wraparound
handling. This patch fixes the problem by handling 32-Bit
wraparound in both directions. As long as successive swap
completion events real 64-Bit sbc's don't differ by more
than 2^30, this should be able to do the right thing.
How this is supposed to work:
awire->sbc contains the low 32-Bits of the true 64-Bit sbc
of the current swap event, transmitted over the wire.
glxDraw->lastEventSbc contains the low 32-Bits of the 64-Bit
sbc of the most recently processed swap event.
glxDraw->eventSbcWrap is a 64-Bit offset which tracks the upper
32-Bits of the current sbc. The final 64-Bit output sbc
aevent->sbc is computed from the sum of awire->sbc and
glxDraw->eventSbcWrap.
Under DRI3/Present, swap completion events can be received
slightly out of order due to non-monotic targetMsc specified
by client code, e.g., present request submission:
Submission sbc: 1 2 3
targetMsc: 10 11 9
Reception of completion events:
Completion sbc: 3 1 2
The completion sequence 3, 1, 2 would confuse the old wraparound
handling made for DRI2 as 1 < 3 --> Assumes a 32-Bit wraparound
has happened when it hasn't.
The client can queue multiple present requests, in the case of
Mesa up to n requests for n-buffered rendering, e.g., n = 2-4 in
the current Mesa GLX DRI3/Present implementation. In the case of
direct Pixmap presents via xcb_present_pixmap() the number n is
limited by the amount of memory available.
We reasonably assume that the number of outstanding requests n is
much less than 2 billion due to memory contraints and common sense.
Therefore while the order of received sbc's can be a bit scrambled,
successive 64-Bit sbc's won't deviate by much, a given sbc may be
a few counts lower or higher than the previous received sbc.
Therefore any large difference between the incoming awire->sbc and
the last recorded glxDraw->lastEventSbc will be due to 32-Bit
wraparound and we need to adapt glxDraw->eventSbcWrap accordingly
to adjust the upper 32-Bits of the sbc.
Two cases, correponding to the two if-statements in the patch:
a) Previous sbc event was below the last 2^32 boundary, in the previous
glxDraw->eventSbcWrap epoch, the new sbc event is in the next 2^32
epoch, therefore the low 32-Bit awire->sbc wrapped around to zero,
or close to zero --> awire->sbc is apparently much lower than the
glxDraw->lastEventSbc recorded for the previous epoch
--> We need to increment glxDraw->eventSbcWrap by 2^32 to adjust
the current epoch to be one higher than the previous one.
--> Case a) also handles the old DRI2 behaviour.
b) Previous sbc event was above closest 2^32 boundary, but now a
late event from the previous 2^32 epoch arrives, with a true sbc
that belongs to the previous 2^32 segment, so the awire->sbc of
this late event has a high count close to 2^32, whereas
glxDraw->lastEventSbc is closer to zero --> awire->sbc is much
greater than glXDraw->lastEventSbc.
--> We need to decrement glxDraw->eventSbcWrap by 2^32 to adjust
the current epoch back to the previous lower epoch of this late
completion event.
We assume such a wraparound to a higher (a) epoch or lower (b)
epoch has happened if awire->sbc and glxDraw->lastEventSbc differ
by more than 2^30 counts, as such a difference can only happen
on wraparound, or if somehow 2^30 present requests would be pending
for a given drawable inside the server, which is rather unlikely.
v2: Explain the reason for this patch and the new wraparound handling
much more extensive in commit message, no code change wrt. initial
version.
Cc: "10.3 10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Emil Velikov [Mon, 16 Mar 2015 14:47:09 +0000 (14:47 +0000)]
r600g: constify r600_shader_tgsi_instruction lists.
Massive list of constant data. Annotate it as such.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Emil Velikov [Mon, 16 Mar 2015 14:47:08 +0000 (14:47 +0000)]
r600g: kill off r600_shader_tgsi_instruction::{tgsi_opcode,is_op3}
Both of which are no longer used. Use designated initializer to make
things obvious as people add/remove TGSI_OPCODEs.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Emil Velikov [Mon, 16 Mar 2015 14:47:07 +0000 (14:47 +0000)]
r600g: use the tgsi opcode from parse.FullToken.FullInstruction
... rather than the local one in inst_info->tgsi_opcode.
This will allow us to simplify struct r600_shader_tgsi_instruction.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Ian Romanick [Sat, 28 Feb 2015 16:32:57 +0000 (08:32 -0800)]
i965/fs: Apply gl_FrontFacing ? -1 : 1 optimization only for floats
At the very least, unreal4/sun-temple/102.shader_test uses this pattern
for a signed integer result. However, that shader did not hit the
optimization in the first place because it uses !gl_FrontFacing. I
changed the shader to use remove the logical-not and reverse the other
operands. I verified that incorrect code is generated before this
change and correct code is generated after.
Fixes fs-frontfacing-ternary-1-neg-1.shader_test.
No shader-db changes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Sat, 28 Feb 2015 16:26:37 +0000 (08:26 -0800)]
i965/fs: Change try_opt_frontfacing_ternary to eliminate asserts
If we check for the case that is actually necessary, the asserts
become superfluous.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Tue, 3 Feb 2015 19:12:28 +0000 (21:12 +0200)]
i965/fs: Handle CMP.nz ... 0 and AND.nz ... 1 similarly in cmod propagation
Espically on platforms that do not natively generate 0u and ~0u for
Boolean results, we generate a lot of sequences where a CMP is
followed by an AND with 1. emit_bool_to_cond_code does this, for
example. On ILK, this results in a sequence like:
add(8) g3<1>F g8<8,8,1>F -g4<0,1,0>F
cmp.l.f0(8) g3<1>D g3<8,8,1>F 0F
and.nz.f0(8) null g3<8,8,1>D 1D
(+f0) iff(8) Jump: 6
The AND.nz is obviously redundant. By propagating the cmod, we can
instead generate
add.l.f0(8) null g8<8,8,1>F -g4<0,1,0>F
(+f0) iff(8) Jump: 6
Existing code already handles the propagation from the CMP to the ADD.
Shader-db results:
GM45 (0x2A42):
total instructions in shared programs:
3550829 ->
3550788 (-0.00%)
instructions in affected programs: 10028 -> 9987 (-0.41%)
helped: 24
Iron Lake (0x0046):
total instructions in shared programs:
4993146 ->
4993105 (-0.00%)
instructions in affected programs: 9675 -> 9634 (-0.42%)
helped: 24
Ivy Bridge (0x0166):
total instructions in shared programs:
6291870 ->
6291794 (-0.00%)
instructions in affected programs: 17914 -> 17838 (-0.42%)
helped: 48
Haswell (0x0426):
total instructions in shared programs:
5779256 ->
5779180 (-0.00%)
instructions in affected programs: 16694 -> 16618 (-0.46%)
helped: 48
Broadwell (0x162E):
total instructions in shared programs:
6823088 ->
6823014 (-0.00%)
instructions in affected programs: 15824 -> 15750 (-0.47%)
helped: 46
No chage on Sandy Bridge or on any platform when NIR is used.
v2: Add unit tests suggested by Matt. Remove spurious writes_flag()
check on scan_inst when scan_inst is known to be BRW_OPCODE_CMP (also
suggested by Matt).
v3: Fix some comments and remove some explicit int() casts in fs_reg
constructors in the unit tests. Both suggested by Matt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Matt Turner [Thu, 5 Mar 2015 01:27:21 +0000 (17:27 -0800)]
i965: Mark paths in linear <-> tiled functions as unreachable().
text data bss dec hex filename
9663 0 0 9663 25bf intel_tiled_memcpy.o before
8215 0 0 8215 2017 intel_tiled_memcpy.o after
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Matt Turner [Sat, 14 Mar 2015 00:00:26 +0000 (17:00 -0700)]
egl: Remove eglQueryString virtual dispatch.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Laura Ekstrand [Tue, 17 Mar 2015 20:27:31 +0000 (13:27 -0700)]
main: Correct _mesa_error with no format in bufferobj.c.
This fixes Bug 89616, a build failure due to line 1639 of bufferobj.c:
_mesa_error(ctx, GL_INVALID_OPERATION, func);
Trivial.
Laura Ekstrand [Thu, 12 Feb 2015 00:53:46 +0000 (16:53 -0800)]
main: Cosmetic changes to GetBufferSubData.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Laura Ekstrand [Tue, 20 Jan 2015 23:24:53 +0000 (15:24 -0800)]
main: Add entry point for GetNamedBufferSubData.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Laura Ekstrand [Mon, 16 Mar 2015 23:08:36 +0000 (16:08 -0700)]
main: Cosmetic updates to GetBufferPointerv.
v3: Review from Fredrik Hoglund
-Split cosmetic refactor of GetBufferPointerv out into a separate commit
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Laura Ekstrand [Tue, 20 Jan 2015 22:32:35 +0000 (14:32 -0800)]
main: Add entry point for GetNamedBufferPointerv.
v3: Review from Fredrik Hoglund
-Split cosmetic refactor of GetBufferPointerv out into a separate commit
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Laura Ekstrand [Thu, 12 Feb 2015 00:10:20 +0000 (16:10 -0800)]
main: Add entry points for GetNamedBufferParameteri[64]v.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Laura Ekstrand [Thu, 12 Feb 2015 00:07:45 +0000 (16:07 -0800)]
main: Refactor GetBufferParameteri[64]v.
v2: Split into a refactor commit and an entry point commit.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Laura Ekstrand [Thu, 12 Feb 2015 00:06:52 +0000 (16:06 -0800)]
main: Add entry point for FlushMappedNamedBufferRange.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Laura Ekstrand [Thu, 15 Jan 2015 01:01:20 +0000 (17:01 -0800)]
main: Refactor FlushMappedBufferRange.
v2:-Remove "_mesa" from in front of static software fallback.
-Split out the refactor from the addition of the DSA entry points.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Laura Ekstrand [Wed, 14 Jan 2015 22:52:01 +0000 (14:52 -0800)]
main: Add entry point for UnmapNamedBuffer.
v2: review from Ian Romanick
- Restore VBO_DEBUG and BOUNDS_CHECK
- Remove _mesa from static software fallback unmap_buffer.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Laura Ekstrand [Wed, 11 Feb 2015 22:09:52 +0000 (14:09 -0800)]
main: Add entry points for MapNamedBuffer[Range].
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Laura Ekstrand [Wed, 14 Jan 2015 20:44:39 +0000 (12:44 -0800)]
main: Refactor MapBuffer[Range].
v2: review from Jason Ekstrand
- Split refactor from addition of DSA entry points.
review from Ian Romanick
- Remove "_mesa" from static software fallback map_buffer_range
- Restore VBO_DEBUG and BOUNDS_CHECK
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Laura Ekstrand [Wed, 11 Feb 2015 19:45:57 +0000 (11:45 -0800)]
main: Minor whitespace fixes in ClearNamedBuffer[Sub]Data.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Laura Ekstrand [Wed, 11 Feb 2015 20:17:38 +0000 (12:17 -0800)]
main: Add entry points for ClearNamedBuffer[Sub]Data.
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Laura Ekstrand [Tue, 13 Jan 2015 23:20:19 +0000 (15:20 -0800)]
main: Refactor ClearBuffer[Sub]Data.
v2: review by Jason Ekstrand
- Split refactor of clear buffer sub data from addition of DSA entry
points.
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Laura Ekstrand [Tue, 13 Jan 2015 21:28:08 +0000 (13:28 -0800)]
main: Add entry point for CopyNamedBufferSubData.
v2: remove _mesa in front of static software fallback.
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Laura Ekstrand [Wed, 11 Feb 2015 19:06:42 +0000 (11:06 -0800)]
main: Improve errors and style in BufferSubData.
- More explicit error reporting.
- Removed legacy style.
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Laura Ekstrand [Tue, 13 Jan 2015 19:28:17 +0000 (11:28 -0800)]
main: Add entry point for NamedBufferSubData.
v2: review by Ian Romanick
- Remove "_mesa" from name of static software fallback buffer_sub_data.
- Remove mappedRange from _mesa_buffer_sub_data.
- Removed some cosmetic changes to a separate commit.
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Laura Ekstrand [Tue, 10 Feb 2015 01:57:46 +0000 (17:57 -0800)]
main: Add entry point for NamedBufferData.
v2: review from Ian Romanick
- Fix space in ARB_direct_state_access.xml.
- Remove "_mesa" from the name of buffer_data static fallback.
- Restore VBO_DEBUG and BOUNDS_CHECK.
- Fix beginning of comment to start on same line as /*
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Laura Ekstrand [Sat, 10 Jan 2015 00:17:10 +0000 (16:17 -0800)]
main: Add entry point for NamedBufferStorage.
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Laura Ekstrand [Fri, 19 Dec 2014 01:10:06 +0000 (17:10 -0800)]
main: Add entry point for CreateBuffers.
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Laura Ekstrand [Tue, 17 Mar 2015 16:43:52 +0000 (09:43 -0700)]
Revert "main: _mesa_cube_level_complete checks NumLayers."
This reverts commit
1ee000a0b6737d6c140d4f07b6044908b8ebfdc7.
Failures with the GLES3 conformance suite and Synmark2 OGLHdrBloom revealed
that this commit was in error.
Extensive testing with Piglit prior to patch review and upstreaming did not
reveal this problem because, in the few Piglit tests that test for cube
completeness, NumLayers = 6. This is because all of the existing tests use
TextureStorage to initialize the texture, which sets NumLayers.
A new Piglit test has been sent to the mailing list that reproduces the bug
related to this patch ("texturing: Testing
glGenerateMipmap(GL_TEXTURE_CUBE_MAP) without glTexStorage2D").
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Neil Roberts [Thu, 12 Mar 2015 17:41:07 +0000 (17:41 +0000)]
i965/skl: Send a message header when doing constant loads SIMD4x2
Commit
0ac4c272755c7 made it add a header for the send message when
using SIMD4x2 on Skylake because without this it will end up using
SIMD8D. However the patch missed the case when a sampler is being used
to implement constant loads from a buffer surface in a SIMD4x2 vertex
shader.
This fixes 29 Piglit tests, mostly related to the ARL instruction in
vertex programs.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Tapani Pälli [Mon, 16 Mar 2015 08:08:08 +0000 (10:08 +0200)]
i965/fs: in MAD optimizations, switch last argument to be immediate
Commit
bb33a31 introduced optimizations that transform cases of MAD
in to simpler forms but it did not take in to account that src[0]
can not be immediate and did not report progress. Patch switches
src[0] and src[1] if src[0] is immediate and adds progress
reporting. If both sources are immediates, this is taken care of by
the same opt_algebraic pass on later run.
v2: Fix for all cases, use temporary fs_reg (Matt, Kenneth)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89569
Reviewed-by: Francisco Jerez <currojerez@riseup.net> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Vinson Lee [Sat, 14 Mar 2015 08:45:03 +0000 (01:45 -0700)]
common.py: Fix PEP 8 issues.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Roland Scheidegger [Fri, 13 Mar 2015 22:45:20 +0000 (23:45 +0100)]
gallivm: abort properly when running out of buffer space in lp_disassembly
Before this actually ran into an infinite loop printing out "invalid"...
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Marek Olšák [Mon, 16 Mar 2015 22:24:15 +0000 (23:24 +0100)]
docs/GL3: also mark GLES3/GS5 for radeonsi as done
Emil Velikov [Mon, 16 Mar 2015 15:00:19 +0000 (15:00 +0000)]
st/dri: remove unused include from the automake/scons build
st/dri/common hasn't been around for a while.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Emil Velikov [Mon, 16 Mar 2015 15:00:18 +0000 (15:00 +0000)]
auxiliary/os: fix the android build - s/drm_munmap/os_munmap/
Squash this silly typo introduced with commit
c63eb5dd5ec(auxiliary/os: get
the mmap/munmap wrappers working with android)
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Emil Velikov [Mon, 16 Mar 2015 11:50:47 +0000 (11:50 +0000)]
gallium/sw/kms: trivial cleanups
Remove the forward declaration and make use of the DEBUG_PRINT macro for
debug builds.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Emil Velikov [Wed, 11 Mar 2015 19:12:35 +0000 (19:12 +0000)]
loader: include <sys/stat.h> for non-sysfs builds
Required by fstat(), otherwise we'll error out due to implicit function
declaration.
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89530
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reported-by: Vadim Rutkovsky <vrutkovs@redhat.com>
Tested-by: Vadim Rutkovsky <vrutkovs@redhat.com>
Felix Janda [Mon, 2 Feb 2015 19:04:16 +0000 (20:04 +0100)]
c11/threads: Use PTHREAD_MUTEX_RECURSIVE by default
Previously PTHREAD_MUTEX_RECURSIVE_NP had been used on linux for
compatibility with old glibc. Since mesa defines __GNU_SOURCE__
on linux PTHREAD_MUTEX_RECURSIVE is also available since at least
1998. So we can unconditionally use the portable version
PTHREAD_MUTEX_RECURSIVE.
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88534
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Marek Olšák [Sat, 28 Feb 2015 13:31:45 +0000 (14:31 +0100)]
radeonsi: implement TGSI_OPCODE_BFI (v2)
v2: Don't use the intrinsics, the shader backend can recognize these
patterns and generates optimal code automatically.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Marek Olšák [Fri, 27 Feb 2015 18:09:30 +0000 (19:09 +0100)]
radeonsi: add a helper for extracting bitfields from parameters (v2)
This will be used a lot (especially by tessellation).
v2: don't use the bfe intrinsic
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Antia Puentes [Thu, 12 Mar 2015 12:59:17 +0000 (13:59 +0100)]
i965: Emit IF/ELSE/ENDIF/WHILE JIP with type W on Gen7
IvyBridge and Haswell PRM say that the JIP should be emitted
with type W but we were using UD. The previous implementation
did not show adverse effects, but IMHO it is safer to follow
the specification thoroughly.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Antia Puentes <apuentes@igalia.com>
Marek Olšák [Sun, 15 Mar 2015 19:13:52 +0000 (20:13 +0100)]
radeonsi: move scratch reloc state setup
- move it to its own function
- do it after all states are emitted
- bump SI_MAX_DRAW_CS_DWORDS
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sun, 15 Mar 2015 18:24:13 +0000 (19:24 +0100)]
radeonsi: don't emit PA_SC_LINE_STIPPLE if not rendering lines
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sun, 15 Mar 2015 18:21:31 +0000 (19:21 +0100)]
radeonsi: don't emit PA_SC_LINE_STIPPLE after every rasterizer state change
Do it only when the line stipple state is changed.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sun, 15 Mar 2015 17:53:50 +0000 (18:53 +0100)]
radeonsi: move PA_SU_SC_MODE_CNTL to rasterizer state
This requires enabling the optional GL provoking vertex behavior for quads.
+ some cosmetic changes, so that the register is set exactly the same as
on r600.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sun, 15 Mar 2015 17:20:19 +0000 (18:20 +0100)]
radeonsi: implement line and polygon smoothing
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sun, 15 Mar 2015 17:11:19 +0000 (18:11 +0100)]
radeonsi: add shader code for smoothing
The fragment shader multiplies the alpha channel with gl_SampleMaskIn.
If blending is enabled, it looks like MSAA.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sun, 15 Mar 2015 16:54:29 +0000 (17:54 +0100)]
radeonsi: split sample locations into its own state atom
Sample locations are not updated as often as framebuffers.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sun, 15 Mar 2015 16:14:53 +0000 (17:14 +0100)]
radeonsi: add basic code for overrasterization
This will be used for line and polygon smoothing.
This is GCN-only even though it's in shared code.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sat, 28 Feb 2015 16:22:54 +0000 (17:22 +0100)]
radeonsi: small cleanup in si_shader_selector_key
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sat, 28 Feb 2015 16:16:57 +0000 (17:16 +0100)]
radeonsi: simplify accessing alpha pointer in si_llvm_emit_fs_epilogue
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Fri, 13 Mar 2015 15:21:11 +0000 (16:21 +0100)]
radeonsi: add support for easy opcodes from ARB_gpu_shader5
I have to use the BFE instrinsics, because BFE is one of the most complex
instructions that can't be matched easily. BFE has 3 conditional branches
and one of them is quite big.
In the isel DAG, lowered BFE has 27 nodes (including leafs).
Marek Olšák [Sat, 28 Feb 2015 13:01:43 +0000 (14:01 +0100)]
radeonsi: implement bit-finding opcodes from ARB_gpu_shader5
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Marek Olšák [Fri, 27 Feb 2015 23:30:26 +0000 (00:30 +0100)]
radeonsi: implement gl_SampleMaskIn
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Marek Olšák [Mon, 2 Mar 2015 01:40:57 +0000 (02:40 +0100)]
radeonsi: add support for SQRT
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Marek Olšák [Fri, 27 Feb 2015 23:44:19 +0000 (00:44 +0100)]
radeonsi: add support for FMA
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Marek Olšák [Fri, 27 Feb 2015 17:39:40 +0000 (18:39 +0100)]
gallium/radeon: don't use LLVMReadOnlyAttribute for ALU
None of the instructions use a pointer argument.
(+ small cosmetic changes)
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Marek Olšák [Fri, 27 Feb 2015 23:34:53 +0000 (00:34 +0100)]
tgsi: handle bitwise opcodes in tgsi_opcode_infer_type (v2)
v2: set the same types as the destination type in tgsi_exec
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Marek Olšák [Fri, 27 Feb 2015 23:26:31 +0000 (00:26 +0100)]
gallium: add FMA and DFMA opcodes (v3)
Needed by ARB_gpu_shader5.
v2: select DMAD for FMA with double precision
v3: add and select DFMA
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Rob Clark [Sun, 15 Mar 2015 21:59:01 +0000 (17:59 -0400)]
freedreno: update generated headers
Fix a3xx texture layer-size.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>