git.libre-soc.org Git - mesa.git/log

docs/GL3: don't list r300

r300g already supports everything it can. There's no point in listing
the driver here.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

radeonsi: increase coords array size for radeon_llvm_emit_prepare_cube_coords

radeon_llvm_emit_prepare_cube_coords uses coords[4] in some cases (TXB2 etc.)

Discovered by Coverity. Reported by Ilia Mirkin.

Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

configure: check if compiler supports -Werror=vla.

Check if the compiler supports -Werror=vla before using it.
-Wvla was introduced with GCC 4.3 and is not present in 4.2.
Fixes the build on OpenBSD.

v2: Fix statement order, and quote $save_CFLAGS.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89433
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Signed-off-by: Jose Fonseca <jfonseca@vmware.com>

i965: Defer the throttle until we submit new commands

Currently, we throttle before the user begins preparing commands for the
next frame when we acquire the draw/read buffers. However, construction
of the command buffer can itself take significant time relative to the
frame time. If we move the throttle from the buffer acquire to the
command submit phase we can allow the user to improve concurrency
between the CPU and GPU (i.e. reduce the amount of time we waste inside
the throttle).

v2: Whitespace + delay throttling until after the next submission for
greater parallelism

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Chad Versace <chad.versace@linux.intel.com>
Cc: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com> [v1]

i965: Throttle to the previous frame

In order to facilitate the concurrency offered by triple buffering and to
offset the latency induced by swapping via an external process, which
may incur extra rendering itself, only throttle to the previous frame
and not the last. The second issue that mostly affects swap benchmarks,
but also can incur jitter in the throttling, is that the throttle bo is
closer to the next SwapBuffers rather than immediately after the previous
SwapBuffers. Throttling to the previous frame doubles the maximum possible
latency at the benefit of improving throughput and reducing jitter.

v2: Rename "first_post_swapbuffer" batches array to a plain
throttle_batch[] as the pluralisation was contorting the name and not
making it clear as to whether it was the first batch or first_post_swap
batch. Not least of which was that not all throttle points are SwapBuffers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Chad Versace <chad.versace@linux.intel.com>
Cc: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>

i965: Throttle rendering to an fbo

When rendering to an fbo, even though it may be acting as a winsys
frontbuffer or just generally, we never throttle. However, when rendering
to an fbo, there is no natural frame boundary. Conventionally we use
SwapBuffers and glFinish, but potential callers avoid often glFinish for
being too heavy handed (waiting on all outstanding rendering to complete).
The kernel provides a soft-throttling option for this case that waits for
rendering older than 20ms to be complete (that's a little too lax to be
used for swapbuffers, but is here a useful safety net). The remaining
choice is then either never to throttle, throttle after every draw call,
or at after intermediate user defined point such as glFlush and thus all the
implied flushes. This patch opts for the latter as that is the current
method used for flushing to front buffers.

v2: Defer the throttling from inside the flush to the next
intel_prepare_render() and switch non-fbo frontbuffer throttling over to
use the same lax method. The issuing being that
glFlush()/intel_prepare_read() is just as likely to be called inside a
tight loop and not at "frame" boundaries.

v3: Rename from need_front_throttle to need_flush_throttle to avoid any
ambiguity between front buffer rendering and fbo rendering. (Chad)

v4: Whitespace

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Chad Versace <chad.versace@linux.intel.com>
Cc: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>

nir/peephole_select: Allow uniform/input loads and load_const

Shader-db results on HSW:

total instructions in shared programs: 4174156 -> 4157291 (-0.40%)
instructions in affected programs:     145397 -> 128532 (-11.60%)
helped:                                383
HURT:                                  0
GAINED:                                20
LOST:                                  22

There are two more tests lost than gained.  However, comparing this with
GLSL IR vs. NIR results, the overall delta is reduced from 85/44
gained/lost on current master to 71/32 with this commit.  Therefore, I
think it's probably a boon since we are getting "closer" to where we were
before.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

nir/peephole_select: Copy instructions into the block before the if

Previously we tried to do poor-man's copy propagation as we created the
select instructions. Instead, this commit just moves the instructions from
the blocks inside the if into the block before. Copy propagation will take
care of making sure we don't have any extra mov's in there for us.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

nir/peephole_select: Rename are_all_move_to_phi and use a switch

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

glx: Handle out-of-sequence swap completion events correctly. (v2)

The code for emitting INTEL_swap_events swap completion
events needs to translate from 32-Bit sbc on the wire to
64-Bit sbc for the events and handle wraparound accordingly.

It assumed that events would be sent by the server in the
order their corresponding swap requests were emitted from
the client, iow. sbc count should be always increasing. This
was correct for DRI2.

This is not always the case under the DRI3/Present backend,
where the Present extension can execute presents and send out
completion events in a different order than the submission
order of the present requests, due to client code specifying
targetMSC target vblank counts which are not strictly
monotonically increasing. This confused the wraparound
handling. This patch fixes the problem by handling 32-Bit
wraparound in both directions. As long as successive swap
completion events real 64-Bit sbc's don't differ by more
than 2^30, this should be able to do the right thing.

How this is supposed to work:

awire->sbc contains the low 32-Bits of the true 64-Bit sbc
of the current swap event, transmitted over the wire.

glxDraw->lastEventSbc contains the low 32-Bits of the 64-Bit
sbc of the most recently processed swap event.

glxDraw->eventSbcWrap is a 64-Bit offset which tracks the upper
32-Bits of the current sbc. The final 64-Bit output sbc
aevent->sbc is computed from the sum of awire->sbc and
glxDraw->eventSbcWrap.

Under DRI3/Present, swap completion events can be received
slightly out of order due to non-monotic targetMsc specified
by client code, e.g., present request submission:

Submission sbc:   1   2   3
targetMsc:        10  11  9

Reception of completion events:
Completion sbc:   3   1   2

The completion sequence 3, 1, 2 would confuse the old wraparound
handling made for DRI2 as 1 < 3 --> Assumes a 32-Bit wraparound
has happened when it hasn't.

The client can queue multiple present requests, in the case of
Mesa up to n requests for n-buffered rendering, e.g., n =  2-4 in
the current Mesa GLX DRI3/Present implementation. In the case of
direct Pixmap presents via xcb_present_pixmap() the number n is
limited by the amount of memory available.

We reasonably assume that the number of outstanding requests n is
much less than 2 billion due to memory contraints and common sense.
Therefore while the order of received sbc's can be a bit scrambled,
successive 64-Bit sbc's won't deviate by much, a given sbc may be
a few counts lower or higher than the previous received sbc.

Therefore any large difference between the incoming awire->sbc and
the last recorded glxDraw->lastEventSbc will be due to 32-Bit
wraparound and we need to adapt glxDraw->eventSbcWrap accordingly
to adjust the upper 32-Bits of the sbc.

Two cases, correponding to the two if-statements in the patch:

a) Previous sbc event was below the last 2^32 boundary, in the previous
glxDraw->eventSbcWrap epoch, the new sbc event is in the next 2^32
epoch, therefore the low 32-Bit awire->sbc wrapped around to zero,
or close to zero --> awire->sbc is apparently much lower than the
glxDraw->lastEventSbc recorded for the previous epoch

--> We need to increment glxDraw->eventSbcWrap by 2^32 to adjust
the current epoch to be one higher than the previous one.

--> Case a) also handles the old DRI2 behaviour.

b) Previous sbc event was above closest 2^32 boundary, but now a
late event from the previous 2^32 epoch arrives, with a true sbc
that belongs to the previous 2^32 segment, so the awire->sbc of
this late event has a high count close to 2^32, whereas
glxDraw->lastEventSbc is closer to zero --> awire->sbc is much
greater than glXDraw->lastEventSbc.

--> We need to decrement glxDraw->eventSbcWrap by 2^32 to adjust
the current epoch back to the previous lower epoch of this late
completion event.

We assume such a wraparound to a higher (a) epoch or lower (b)
epoch has happened if awire->sbc and glxDraw->lastEventSbc differ
by more than 2^30 counts, as such a difference can only happen
on wraparound, or if somehow 2^30 present requests would be pending
for a given drawable inside the server, which is rather unlikely.

v2: Explain the reason for this patch and the new wraparound handling
    much more extensive in commit message, no code change wrt. initial
    version.

Cc: "10.3 10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

r600g: constify r600_shader_tgsi_instruction lists.

Massive list of constant data. Annotate it as such.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

r600g: kill off r600_shader_tgsi_instruction::{tgsi_opcode,is_op3}

Both of which are no longer used. Use designated initializer to make
things obvious as people add/remove TGSI_OPCODEs.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

r600g: use the tgsi opcode from parse.FullToken.FullInstruction

... rather than the local one in inst_info->tgsi_opcode.

This will allow us to simplify struct r600_shader_tgsi_instruction.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

i965/fs: Apply gl_FrontFacing ? -1 : 1 optimization only for floats

At the very least, unreal4/sun-temple/102.shader_test uses this pattern
for a signed integer result.  However, that shader did not hit the
optimization in the first place because it uses !gl_FrontFacing.  I
changed the shader to use remove the logical-not and reverse the other
operands.  I verified that incorrect code is generated before this
change and correct code is generated after.

Fixes fs-frontfacing-ternary-1-neg-1.shader_test.

No shader-db changes.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

i965/fs: Change try_opt_frontfacing_ternary to eliminate asserts

If we check for the case that is actually necessary, the asserts
become superfluous.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

i965/fs: Handle CMP.nz ... 0 and AND.nz ... 1 similarly in cmod propagation

Espically on platforms that do not natively generate 0u and ~0u for
Boolean results, we generate a lot of sequences where a CMP is
followed by an AND with 1.  emit_bool_to_cond_code does this, for
example.  On ILK, this results in a sequence like:

    add(8)          g3<1>F          g8<8,8,1>F      -g4<0,1,0>F
    cmp.l.f0(8)     g3<1>D          g3<8,8,1>F      0F
    and.nz.f0(8)    null            g3<8,8,1>D      1D
    (+f0) iff(8)    Jump: 6

The AND.nz is obviously redundant.  By propagating the cmod, we can
instead generate

    add.l.f0(8)     null            g8<8,8,1>F      -g4<0,1,0>F
    (+f0) iff(8)    Jump: 6

Existing code already handles the propagation from the CMP to the ADD.

Shader-db results:

GM45 (0x2A42):
total instructions in shared programs: 3550829 -> 3550788 (-0.00%)
instructions in affected programs:     10028 -> 9987 (-0.41%)
helped:                                24

Iron Lake (0x0046):
total instructions in shared programs: 4993146 -> 4993105 (-0.00%)
instructions in affected programs:     9675 -> 9634 (-0.42%)
helped:                                24

Ivy Bridge (0x0166):
total instructions in shared programs: 6291870 -> 6291794 (-0.00%)
instructions in affected programs:     17914 -> 17838 (-0.42%)
helped:                                48

Haswell (0x0426):
total instructions in shared programs: 5779256 -> 5779180 (-0.00%)
instructions in affected programs:     16694 -> 16618 (-0.46%)
helped:                                48

Broadwell (0x162E):
total instructions in shared programs: 6823088 -> 6823014 (-0.00%)
instructions in affected programs:     15824 -> 15750 (-0.47%)
helped:                                46

No chage on Sandy Bridge or on any platform when NIR is used.

v2: Add unit tests suggested by Matt.  Remove spurious writes_flag()
check on scan_inst when scan_inst is known to be BRW_OPCODE_CMP (also
suggested by Matt).

v3: Fix some comments and remove some explicit int() casts in fs_reg
constructors in the unit tests.  Both suggested by Matt.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

i965: Mark paths in linear <-> tiled functions as unreachable().

text    data     bss     dec     hex filename
9663       0       0    9663    25bf intel_tiled_memcpy.o   before
8215       0       0    8215    2017 intel_tiled_memcpy.o   after

Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

egl: Remove eglQueryString virtual dispatch.

Reviewed-by: Chad Versace <chad.versace@intel.com>

main: Correct _mesa_error with no format in bufferobj.c.

This fixes Bug 89616, a build failure due to line 1639 of bufferobj.c:
_mesa_error(ctx, GL_INVALID_OPERATION, func);

Trivial.

main: Cosmetic changes to GetBufferSubData.

Reviewed-by: Fredrik Höglund <fredrik@kde.org>

main: Add entry point for GetNamedBufferSubData.

Reviewed-by: Fredrik Höglund <fredrik@kde.org>

main: Cosmetic updates to GetBufferPointerv.

v3: Review from Fredrik Hoglund
-Split cosmetic refactor of GetBufferPointerv out into a separate commit

Reviewed-by: Fredrik Höglund <fredrik@kde.org>

main: Add entry point for GetNamedBufferPointerv.

v3: Review from Fredrik Hoglund
-Split cosmetic refactor of GetBufferPointerv out into a separate commit

Reviewed-by: Fredrik Höglund <fredrik@kde.org>

main: Add entry points for GetNamedBufferParameteri[64]v.

Reviewed-by: Fredrik Höglund <fredrik@kde.org>

main: Refactor GetBufferParameteri[64]v.

v2: Split into a refactor commit and an entry point commit.

Reviewed-by: Fredrik Höglund <fredrik@kde.org>

main: Add entry point for FlushMappedNamedBufferRange.

Reviewed-by: Fredrik Höglund <fredrik@kde.org>

main: Refactor FlushMappedBufferRange.

v2:-Remove "_mesa" from in front of static software fallback.
-Split out the refactor from the addition of the DSA entry points.

Reviewed-by: Fredrik Höglund <fredrik@kde.org>

main: Add entry point for UnmapNamedBuffer.

v2: review from Ian Romanick
- Restore VBO_DEBUG and BOUNDS_CHECK
- Remove _mesa from static software fallback unmap_buffer.

Reviewed-by: Fredrik Höglund <fredrik@kde.org>

main: Add entry points for MapNamedBuffer[Range].

Reviewed-by: Fredrik Höglund <fredrik@kde.org>

main: Refactor MapBuffer[Range].

v2: review from Jason Ekstrand
   - Split refactor from addition of DSA entry points.
    review from Ian Romanick
   - Remove "_mesa" from static software fallback map_buffer_range
   - Restore VBO_DEBUG and BOUNDS_CHECK

Reviewed-by: Fredrik Höglund <fredrik@kde.org>

main: Minor whitespace fixes in ClearNamedBuffer[Sub]Data.

Reviewed-by: Fredrik Höglund <fredrik@kde.org>

main: Add entry points for ClearNamedBuffer[Sub]Data.

Reviewed-by: Martin Peres <martin.peres@linux.intel.com>

main: Refactor ClearBuffer[Sub]Data.

v2: review by Jason Ekstrand
- Split refactor of clear buffer sub data from addition of DSA entry
points.

Reviewed-by: Martin Peres <martin.peres@linux.intel.com>

main: Add entry point for CopyNamedBufferSubData.

v2: remove _mesa in front of static software fallback.

Reviewed-by: Martin Peres <martin.peres@linux.intel.com>

main: Improve errors and style in BufferSubData.

- More explicit error reporting.
- Removed legacy style.

Reviewed-by: Martin Peres <martin.peres@linux.intel.com>

main: Add entry point for NamedBufferSubData.

v2: review by Ian Romanick
   - Remove "_mesa" from name of static software fallback buffer_sub_data.
   - Remove mappedRange from _mesa_buffer_sub_data.
   - Removed some cosmetic changes to a separate commit.

Reviewed-by: Martin Peres <martin.peres@linux.intel.com>

main: Add entry point for NamedBufferData.

v2: review from Ian Romanick
   - Fix space in ARB_direct_state_access.xml.
   - Remove "_mesa" from the name of buffer_data static fallback.
   - Restore VBO_DEBUG and BOUNDS_CHECK.
   - Fix beginning of comment to start on same line as /*

Reviewed-by: Martin Peres <martin.peres@linux.intel.com>

main: Add entry point for NamedBufferStorage.

Reviewed-by: Martin Peres <martin.peres@linux.intel.com>

main: Add entry point for CreateBuffers.

Reviewed-by: Martin Peres <martin.peres@linux.intel.com>

Revert "main: _mesa_cube_level_complete checks NumLayers."

This reverts commit 1ee000a0b6737d6c140d4f07b6044908b8ebfdc7.
Failures with the GLES3 conformance suite and Synmark2 OGLHdrBloom revealed
that this commit was in error.

Extensive testing with Piglit prior to patch review and upstreaming did not
reveal this problem because, in the few Piglit tests that test for cube
completeness, NumLayers = 6. This is because all of the existing tests use
TextureStorage to initialize the texture, which sets NumLayers.

A new Piglit test has been sent to the mailing list that reproduces the bug
related to this patch ("texturing: Testing
glGenerateMipmap(GL_TEXTURE_CUBE_MAP) without glTexStorage2D").

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

i965/skl: Send a message header when doing constant loads SIMD4x2

Commit 0ac4c272755c7 made it add a header for the send message when
using SIMD4x2 on Skylake because without this it will end up using
SIMD8D. However the patch missed the case when a sampler is being used
to implement constant loads from a buffer surface in a SIMD4x2 vertex
shader.

This fixes 29 Piglit tests, mostly related to the ARL instruction in
vertex programs.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>

i965/fs: in MAD optimizations, switch last argument to be immediate

Commit bb33a31 introduced optimizations that transform cases of MAD
in to simpler forms but it did not take in to account that src[0]
can not be immediate and did not report progress. Patch switches
src[0] and src[1] if src[0] is immediate and adds progress
reporting. If both sources are immediates, this is taken care of by
the same opt_algebraic pass on later run.

v2: Fix for all cases, use temporary fs_reg (Matt, Kenneth)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89569
Reviewed-by: Francisco Jerez <currojerez@riseup.net> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>

common.py: Fix PEP 8 issues.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>

gallivm: abort properly when running out of buffer space in lp_disassembly

Before this actually ran into an infinite loop printing out "invalid"...

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

docs/GL3: also mark GLES3/GS5 for radeonsi as done

st/dri: remove unused include from the automake/scons build

st/dri/common hasn't been around for a while.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

auxiliary/os: fix the android build - s/drm_munmap/os_munmap/

Squash this silly typo introduced with commit c63eb5dd5ec(auxiliary/os: get
the mmap/munmap wrappers working with android)

Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

gallium/sw/kms: trivial cleanups

Remove the forward declaration and make use of the DEBUG_PRINT macro for
debug builds.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

loader: include <sys/stat.h> for non-sysfs builds

Required by fstat(), otherwise we'll error out due to implicit function
declaration.

Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89530
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reported-by: Vadim Rutkovsky <vrutkovs@redhat.com>
Tested-by: Vadim Rutkovsky <vrutkovs@redhat.com>

c11/threads: Use PTHREAD_MUTEX_RECURSIVE by default

Previously PTHREAD_MUTEX_RECURSIVE_NP had been used on linux for
compatibility with old glibc. Since mesa defines __GNU_SOURCE__
on linux PTHREAD_MUTEX_RECURSIVE is also available since at least
1998. So we can unconditionally use the portable version
PTHREAD_MUTEX_RECURSIVE.

Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88534
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>

radeonsi: implement TGSI_OPCODE_BFI (v2)

v2: Don't use the intrinsics, the shader backend can recognize these
patterns and generates optimal code automatically.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>

radeonsi: add a helper for extracting bitfields from parameters (v2)

This will be used a lot (especially by tessellation).

v2: don't use the bfe intrinsic

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>

i965: Emit IF/ELSE/ENDIF/WHILE JIP with type W on Gen7

IvyBridge and Haswell PRM say that the JIP should be emitted
with type W but we were using UD. The previous implementation
did not show adverse effects, but IMHO it is safer to follow
the specification thoroughly.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Antia Puentes <apuentes@igalia.com>

radeonsi: move scratch reloc state setup

- move it to its own function
- do it after all states are emitted
- bump SI_MAX_DRAW_CS_DWORDS

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

radeonsi: don't emit PA_SC_LINE_STIPPLE if not rendering lines

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

radeonsi: don't emit PA_SC_LINE_STIPPLE after every rasterizer state change

Do it only when the line stipple state is changed.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

radeonsi: move PA_SU_SC_MODE_CNTL to rasterizer state

This requires enabling the optional GL provoking vertex behavior for quads.

+ some cosmetic changes, so that the register is set exactly the same as
on r600.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

radeonsi: implement line and polygon smoothing

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

radeonsi: add shader code for smoothing

The fragment shader multiplies the alpha channel with gl_SampleMaskIn.
If blending is enabled, it looks like MSAA.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

radeonsi: split sample locations into its own state atom

Sample locations are not updated as often as framebuffers.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

radeonsi: add basic code for overrasterization

This will be used for line and polygon smoothing.
This is GCN-only even though it's in shared code.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

radeonsi: small cleanup in si_shader_selector_key

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

radeonsi: simplify accessing alpha pointer in si_llvm_emit_fs_epilogue

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

radeonsi: add support for easy opcodes from ARB_gpu_shader5

I have to use the BFE instrinsics, because BFE is one of the most complex
instructions that can't be matched easily. BFE has 3 conditional branches
and one of them is quite big.

In the isel DAG, lowered BFE has 27 nodes (including leafs).

radeonsi: implement bit-finding opcodes from ARB_gpu_shader5

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>

radeonsi: implement gl_SampleMaskIn

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>

radeonsi: add support for SQRT

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>

radeonsi: add support for FMA

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>

gallium/radeon: don't use LLVMReadOnlyAttribute for ALU

None of the instructions use a pointer argument.
(+ small cosmetic changes)

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>

tgsi: handle bitwise opcodes in tgsi_opcode_infer_type (v2)

v2: set the same types as the destination type in tgsi_exec

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

gallium: add FMA and DFMA opcodes (v3)

Needed by ARB_gpu_shader5.

v2: select DMAD for FMA with double precision
v3: add and select DFMA

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

freedreno: update generated headers

Fix a3xx texture layer-size.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>

freedreno/ir3: remove old compiler

Now that piglit is no longer falling back to old compiler for any tests,
we can remove it. Hurray \o/

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno/ir3: avoid scheduler deadlock

Deadlock can occur if we schedule an address register write, yet some
instructions which depend on that address register value also depend on
other unscheduled instructions that depend on a different address
register value. To solve this, before scheduling an address register
write, ensure that all the other dependencies of the instructions which
consume this address register are already scheduled.

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno/ir3: bit of cleanup

Add an array_insert() macro to simplify inserting into dynamically sized
arrays, add a comment, and remove unused prototype inherited from the
original freedreno.git/fdre-a3xx test code, etc.

Signed-off-by: Rob Clark <robclark@freedesktop.org>

i965: De-duplicate is_expression_commutative() functions.

Create a backend_inst::is_commutative() method to replace two static
functions that did the exact same thing.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

i965/gen4-5: Cope with immutable-format texture revalidation

This is unfortunately sometimes necessary due to rebasing levels when
rendering into them.

16 piglits crash -> pass, when building mesa with debug enabled.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

docs: add news item and link release notes for mesa 10.5.1

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>

docs: Add sha256 sums for the 10.5.1 release

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 2abba086ca84f200fae940129c0a5342c3748f00)

Add release notes for the 10.5.1 release

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 11c0ff60ef19cca84452aa989fb8bb25127473e0)

freedreno: fix slice pitch calculations

For example if width were 65, the first slice would get 96 while the
second would get 32. However the hardware appears to expect the second
pitch to be 64, based on halving the 96 (and aligning up to 32).

This fixes texelFetch piglit tests on a3xx below a certain size. Going
higher they break again, but most likely due to unrelated reasons.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Rob Clark <robclark@freedesktop.org>

freedreno/a3xx: use the same layer size for all slices

We only program in one layer size per texture, so that means that all
levels must share one size. This makes the piglit test

bin/texelFetch fs sampler2DArray

have the same breakage as its non-array version instead of being
completely off, and makes

bin/ext_texture_array-gen-mipmap

start passing.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Rob Clark <robclark@freedesktop.org>

i965/vs: Add missing resolve_bool_comparison calls on GEN4 and GEN5

The ir_unop_any problem was discovered by some later optimization passes
that generate ir_triop_csel.  I was also able to reproduce it by
modifying the gl-2.0-vertexattribpointer vertex shader to generate its
result using

   color = mix(vec4(0, 1, 0, 0),
               vec4(1, 0, 0, 0),
               bvec4(any(greaterThan(diff, vec4(tolerance)))));

instead of an if-statement.  This also required using #version 130 and
MESA_GLSL_VERSION_OVERRIDE=130.

I have not nominated this for stable releases because I don't think
there's any way to trigger the problem without GLSL 1.30 or
optimizations that don't exist in stable.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@intel.com>

i965/disasm: Fix format strings

Most of the brw_inst_* api returns 64bit values. This fixes disassembly
of sampler messages, etc.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>

i965/disasm: Mark format() as being printf-style.

This allows us to get warnings from GCC when we mess up the format
strings.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>

docs: List ARB_shading_language_packing/EXT_shader_integer_mix.

Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

glsl: Expose built-in packing functions under GLSL 4.2.

ARB_shading_language_packing is part of GLSL 4.2, not 4.0 as I
mistakenly believed. The following functions are available only with
ARB_shading_language_packing, GLSL 4.2 (not GLSL 4.0), or ES 3.0:

   - packSnorm2x16
   - unpackSnorm2x16
   - packHalf2x16
   - unpackHalf2x16

Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

egl: Create queryable strings in eglInitialize().

Creating/recreating the strings in eglQueryString() is extra work and
isn't thread-safe, as exhibited by shader-db's run.c using libepoxy.

Multiple threads in run.c call eglReleaseThread() around the same time.
libepoxy calls eglQueryString() to determine whether eglReleaseThread()
exists, and our EGL implementation passes a pointer to the version
string to libepoxy while simultaneously overwriting the string, leading
to a failure in libepoxy.

Moreover, the EGL spec says (emphasis mine):

"eglQueryString returns a pointer to a *static*, zero-terminated string"

This patch moves some auxiliary functions from eglmisc.c to eglapi.c so
that they may be used to create the extension, API, and version strings
once during eglInitialize(). The auxiliary functions are renamed from
_eglUpdate* to _eglCreate*, and some checks made unnecessary by calling
the functions from eglInitialize() are removed.

Reviewed-by: Chad Versace <chad.versace@intel.com>

glsl: optimize (0 cmp x + y) into (-x cmp y).

The optimization done by commit 34ec1a24d did not take it into account.

Fixes:

dEQP-GLES3.functional.shaders.random.all_features.fragment.20

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>

mesa: Check for valid PBO access in gl(Compressed)Tex(Sub)Image calls

This patch adds two types of checks to the gl(Compressed)Tex(Sub)Imgage family
of functions when a pixel buffer object is bound to GL_PIXEL_UNPACK_BUFFER:

- That the buffer is not mapped.
- The total data size is within the boundaries of the buffer size.

It does so by calling auxiliary validations functions from PBO API:
_mesa_validate_pbo_source() for non-compressed texture calls, and
_mesa_validate_pbo_source_compressed() for compressed texture calls.

The first check is defined in Section 6.3.2 'Effects of Mapping Buffers
on Other GL Commands' of the GLES 3.1 spec, page 57:

    "Any GL command which attempts to read from, write to, or change the
     state of a buffer object may generate an INVALID_OPERATION error if all
     or part of the buffer object is mapped. However, only commands which
     explicitly describe this error are required to do so. If an error is not
     generated, using such commands to perform invalid reads, writes, or
     state changes will have undefined results and may result in GL
     interruption or termination."

Similar wording exists in GL 4.5 spec, page 76.

In the case of gl(Compressed)Tex(Sub)Image(2,3)D, the specification doesn't force
implemtations to throw an error. However since Mesa don't currently implement
checks to determine when it is safe to read/write from/to a mapped PBO, we
should always return the error if all or parts of it are mapped.

The 2nd check is defined in Section 8.5 'Texture Image Specification' of the
OpenGL 4.5 spec, page 203:

    "An INVALID_OPERATION error is generated if a pixel unpack buffer object
     is bound and storing texture data would access memory beyond the end of
     the pixel unpack buffer."

Fixes 4 dEQP tests:
* dEQP-GLES3.functional.negative_api.texture.compressedteximage2d_invalid_buffer_target
* dEQP-GLES3.functional.negative_api.texture.compressedtexsubimage2d_invalid_buffer_target
* dEQP-GLES3.functional.negative_api.texture.compressedteximage3d_invalid_buffer_target
* dEQP-GLES3.functional.negative_api.texture.compressedtexsubimage3d_invalid_buffer_target

Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>

mesa: Separate PBO validation checks from buffer mapping, to allow reuse

Internal PBO functions such as _mesa_map_validate_pbo_source() and
_mesa_validate_pbo_compressed_teximage() perform validation and buffer mapping
within the same call.

This patch takes out the validation into separate functions to allow reuse
of functionality by other code (i.e, gl(Compressed)Tex(Sub)Image).

Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>

mesa: Set the correct image size in _mesa_validate_pbo_access()

_mesa_validate_pbo_access() provides a generic way to check that a
requested pixel transfer operation on a PBO falls within the
boundaries of the buffer. It is used in various other places, and
depending on the caller, some arguments are used or not.

In particular, the 'clientMemSize' argument is used only by calls
that are knowledgeable of the total size of the user data involved
in a pixel transfer, such as the case of compressed texture image
calls. Other calls don't provide 'clientMemSize' directly since it
is made implicit from the size and format of the texture, and its
data type. In these cases, a sufficiently big value is passed to
'clientMemSize' (INT_MAX) to avoid an incorrect constrain.

The problem is that _mesa_validate_pbo_access() use uint
pointers to make the calculations, which are 64 bits long in 64
bits platforms, meanwhile the dummy INT_MAX passed in 'clientMemSize'
is just 32 bits. This causes a constrain that is not desired.

This patch fixes that by checking that if 'clientMemSize' is MAX_INT,
then UINTPTR_MAX is assumed instead.

This is an ugly workaround to the fact that _mesa_validate_pbo_access()
intends to be a one function fits all. The clean solution here would
be to break it into different functions that provide the adequate API
for each of the possible code paths and validation needs.

Since there are callers relying on passing INT_MAX to 'clientMemSize',
this patch is necessary to deal with the problem above while a cleaner
implementation of the PBO API is not implemented.

Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>

meta: Remove error checks for texture <-> pixel-buffer transfers that don't belong in driver code

The implementation of texture <-> pixel-buffer transfers in drivers common layer
includes certain error checks and argument validation that don't belong there,
considering how the Mesa codebase is laid out. These are higher level
validations that, if necessary, should be performed earlier (i.e, in GL API
entry points).

This patch simply removes these error checks from driver code.

For more information, see discussion at
http://lists.freedesktop.org/archives/mesa-dev/2015-February/077417.html.

Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>

util: convert slab macros to inline functions

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

egl: fix cast to silence compiler warning

eglcurrent.c: In function '_eglSetTSD':
eglcurrent.c:57:4: warning: passing argument 2 of 'tss_set' discards
'const' qualifier from pointer target type [enabled by default]
    tss_set(_egl_TSD, (const void *) t);
    ^
In file included from ../../../include/c11/threads.h:72:0,
                 from eglcurrent.c:32:
../../../include/c11/threads_posix.h:357:1: note: expected 'void *'
but argument is of type 'const void *'
tss_set(tss_t key, void *val)
^

Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>

gallivm: (trivial) Fix typo in comment introduced by 70dc8a

Fix typo in comment introduced by 70dc8a

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Jose Fonseca <jfonseca@vmware.com>

mesa: improve ARB_copy_image internal format compat check

The memory layout of compatible internal formats may differ in bytes per
block, so TexFormat is not a reliable measure of compatibility. For example,
GL_RGB8 and GL_RGB8UI are compatible formats, but GL_RGB8 may be laid out in
memory as B8G8R8X8. If GL_RGB8UI has a 3 byte-per-block memory layout, the
existing compatibility check will fail.

Additionally, the current check allows any two compressed textures which share
block size to be used, whereas the spec gives an explicit table of compatible
formats.

v2: Use a switch instead of array iteration for block class and show the
    correct GL error when internal formats are mismatched.
v3: Include spec citations for new compatibility checks, rearrange check
    order to ensure that compressed, view-compatible formats return the
    correct result, and make style fixes. Original commit message amended
    for clarity.
v4: Reformatted spec citations.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

nir: Fix non-determinism in nir_lower_vars_to_ssa().

Previously, we stored derefs in a hash table, using the malloc'd pointer
as the key.  Then, we walked through the hash table and generated code,
based on the order of the hash table's elements.

Memory addresses returned by malloc are pretty much random, which meant
that the hash was random, and the hash table's elements would be walked
in some random order.  This led to successive compiles of the same
shader using different variable names and slightly different orderings
of phi-nodes.  Code could not be diff'd, and the final assembly would
sometimes change slightly too.

It turns out the only point of the hash table was to avoid inserting
the same node multiple times for different dereferences.  We never
actually searched the hash table!  This patch uses an intrusive
linked list instead.  Since exec_list uses head and tail sentinels,
checking prev or next against NULL will tell us whether the node is
already in the list.

Pair programming with Jason Ekstrand.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

util: Fix foreach_list_typed_safe when exec_node is not at offset 0.

__next and __prev are pointers to the structure containing the exec_node
link, not the embedded exec_node.  NULL checks would fail unless the
embedded exec_node happened to be at offset 0 in the parent struct.

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   Use "(__node)->__field.next != NULL" to check for the end of the list
   instead of the "&__next->__field != NULL".  The former is far more
   obviously correct as it matches what the non-safe versions do.  The
   original code tried to avoid any use of __next as the client code may
   delete it during its execution.  However, since the looping condition is
   checked after the iteration clause but before the client code is
   executed, we know that __node is valid during the looping condition.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: Use NIR for scalar VS when INTEL_USE_NIR is set.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>