mesa.git
8 years agowinsys/virgl: throw in some inline wrappers
Emil Velikov [Wed, 28 Oct 2015 12:38:35 +0000 (12:38 +0000)]
winsys/virgl: throw in some inline wrappers

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
8 years agovirgl: introduce virgl_query() inline wrapper
Emil Velikov [Wed, 28 Oct 2015 11:36:00 +0000 (11:36 +0000)]
virgl: introduce virgl_query() inline wrapper

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
8 years agovirgl: use virgl_screen/surface upcast wrappers
Emil Velikov [Wed, 28 Oct 2015 11:21:49 +0000 (11:21 +0000)]
virgl: use virgl_screen/surface upcast wrappers

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
8 years agovirgl: introduce and use virgl_transfer/texture/resource inline wrappers
Emil Velikov [Wed, 28 Oct 2015 11:14:02 +0000 (11:14 +0000)]
virgl: introduce and use virgl_transfer/texture/resource inline wrappers

The only two remaining cases of (struct virgl_resource *) require a
closer look. Either the error checking is missing or the arguments
provided feel wrong.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
8 years agovirgl: add virgl_context/sampler_view/so_target() upcast wrappers
Emil Velikov [Wed, 28 Oct 2015 10:48:31 +0000 (10:48 +0000)]
virgl: add virgl_context/sampler_view/so_target() upcast wrappers

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
8 years agowinsys/virgl/drm: drop unneeded forward declaration
Emil Velikov [Wed, 28 Oct 2015 11:57:55 +0000 (11:57 +0000)]
winsys/virgl/drm: drop unneeded forward declaration

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
8 years agovirgl: remove sw_winsys pointer from virgl_screen
Emil Velikov [Wed, 28 Oct 2015 10:21:54 +0000 (10:21 +0000)]
virgl: remove sw_winsys pointer from virgl_screen

The screen already has a pointer to the (base) winsys object.
With the latter of which implemented/sub-classed as either drm or sw
based one, depending on the target.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
8 years agovirgl: rename virgl.h to virgl_screen.h
Emil Velikov [Thu, 29 Oct 2015 10:10:35 +0000 (10:10 +0000)]
virgl: rename virgl.h to virgl_screen.h

Provide a more meaningful name considering it's purpose.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
8 years agovirgl: move virgl_hw.h into the driver dir
Emil Velikov [Wed, 28 Oct 2015 14:39:45 +0000 (14:39 +0000)]
virgl: move virgl_hw.h into the driver dir

Strictly speaking virgl_hw.h should reside in the driver folder, as
it describes the hardware. Moving it allows us to nuke the following
strange dependency

winsys/vtest > driver > winsys/drm

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
8 years agovirgl: straighten the includes confusion
Emil Velikov [Mon, 26 Oct 2015 11:53:36 +0000 (11:53 +0000)]
virgl: straighten the includes confusion

Use the relevant GALLIUM_foo_CFLAGS which has all the requirements
(not to mention VISIBITY_CFLAGS) and keep ../ out of the include
directives.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
8 years agovirgl: remove the _FILE_OFFSET_BITS defines
Emil Velikov [Wed, 28 Oct 2015 10:05:27 +0000 (10:05 +0000)]
virgl: remove the _FILE_OFFSET_BITS defines

The build already sets it as needed.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
8 years agowinsys/virgl/drm: add all files to the tarball
Emil Velikov [Mon, 26 Oct 2015 11:51:47 +0000 (11:51 +0000)]
winsys/virgl/drm: add all files to the tarball

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
8 years agowinsys/virgl/vtest: list all files in Makefile.sources
Emil Velikov [Wed, 28 Oct 2015 10:08:25 +0000 (10:08 +0000)]
winsys/virgl/vtest: list all files in Makefile.sources

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
8 years agovirgl: move sources list to Makefile.sources
Emil Velikov [Mon, 26 Oct 2015 11:36:50 +0000 (11:36 +0000)]
virgl: move sources list to Makefile.sources

... and add the missing files while we're at it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
8 years agovirgl: fix drm.h include path
Emil Velikov [Wed, 28 Oct 2015 11:47:18 +0000 (11:47 +0000)]
virgl: fix drm.h include path

The drm/ prefix is required, if using the kernel provided headers. As
most distros don't ship them it and we already depend on libdrm (which
adds the relevant -I flag) just drop the drm/ from the include.

Once a libdrm release with the virtgpu_drm.h header is released, we can
drop our local copy of the file.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
8 years agoi965: enable ARB_shader_clock on gen7+
Emil Velikov [Fri, 30 Oct 2015 17:23:18 +0000 (17:23 +0000)]
i965: enable ARB_shader_clock on gen7+

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965: Implement nir_intrinsic_shader_clock
Emil Velikov [Wed, 7 Oct 2015 10:50:01 +0000 (11:50 +0100)]
i965: Implement nir_intrinsic_shader_clock

v2:
 - Add a few const qualifiers for good measure.
 - Drop unneeded retype()s (Matt)
 - Convert timestamp to SIMD8/16, as fs_visitor::get_timestamp() returns
SIMD4 (Connor)

v3:
 - Remove unneeded temporary + MOV (Connor)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965/fs: move the fs_reg::smear() from get_timestamp() to the callers
Emil Velikov [Fri, 9 Oct 2015 09:40:35 +0000 (10:40 +0100)]
i965/fs: move the fs_reg::smear() from get_timestamp() to the callers

We're about to reuse get_timestamp() for the nir_intrinsic_shader_clock.
In the latter the generalisation does not apply, so move the smear()
where needed. This also makes the function analogous to the vec4 one.

v2: Tweak the comment - The caller -> We (Matt, Connor).
v3: More comment tweaks (Connor)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agonir: add shader_clock intrinsic
Emil Velikov [Wed, 7 Oct 2015 10:59:26 +0000 (11:59 +0100)]
nir: add shader_clock intrinsic

v2: Add flags and inline comment/description.
v3: None of the input/outputs are variables
v4: Drop clockARB reference, relate code motion barrier comment wrt
intrinsic flag.
v5: Drop the "thus we can eliminate..." comment (Connor)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoglsl: add support for the clock2x32ARB function
Emil Velikov [Fri, 2 Oct 2015 09:25:51 +0000 (10:25 +0100)]
glsl: add support for the clock2x32ARB function

v2: correctly set the return type

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoglsl: add ARB_shader_clock infrastructure
Emil Velikov [Fri, 2 Oct 2015 08:56:37 +0000 (09:56 +0100)]
glsl: add ARB_shader_clock infrastructure

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agomesa: add infra for ARB_shader_clock
Emil Velikov [Fri, 2 Oct 2015 08:49:47 +0000 (09:49 +0100)]
mesa: add infra for ARB_shader_clock

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agonv50: do not create an invalid HW query type
Samuel Pitoiset [Sat, 17 Oct 2015 09:24:50 +0000 (11:24 +0200)]
nv50: do not create an invalid HW query type

While we are at it, store the rotate offset for occlusion queries to
nv50_hw_query like on nvc0.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
8 years agonv50: move HW queries to nv50_query_hw.c/h files
Samuel Pitoiset [Fri, 16 Oct 2015 23:04:27 +0000 (01:04 +0200)]
nv50: move HW queries to nv50_query_hw.c/h files

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
8 years agonv50: move nva0_so_target_save_offset() to its correct location
Samuel Pitoiset [Sun, 18 Oct 2015 16:33:41 +0000 (18:33 +0200)]
nv50: move nva0_so_target_save_offset() to its correct location

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
8 years agonv50: add a header file for nv50_query
Samuel Pitoiset [Fri, 16 Oct 2015 22:14:28 +0000 (00:14 +0200)]
nv50: add a header file for nv50_query

Like for nvc0, this will allow to split different types of queries and
to prepare the way for both global performance counters and MP counters.

While we are at it, make use of nv50_query struct instead of pipe_query.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
8 years agost/va: add support to export a surface as dmabuf
Julien Isorce [Fri, 30 Oct 2015 11:42:53 +0000 (11:42 +0000)]
st/va: add support to export a surface as dmabuf

I.e. implements:
VaAcquireBufferHandle
VaReleaseBufferHandle
for memory of type VA_SURFACE_ATTRIB_MEM_TYPE_DRM_PRIME

And apply relatives change to:
vlVaMapBuffer
vlVaUnMapBuffer
vlVaDestroyBuffer

Implementation inspired from cgit.freedesktop.org/vaapi/intel-driver

Tested with gstreamer-vaapi with nouveau driver.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
8 years agost/va: implement VaDeriveImage
Julien Isorce [Fri, 30 Oct 2015 11:42:52 +0000 (11:42 +0000)]
st/va: implement VaDeriveImage

And apply relatives change to:
vlVaBufferSetNumElements
vlVaCreateBuffer
vlVaMapBuffer
vlVaUnmapBuffer
vlVaDestroyBuffer
vlVaPutImage

It is unfortunate that there is no proper va buffer type and struct
for this. Only possible to use VAImageBufferType which is normally
used for normal user data array.
On of the consequences is that it is only possible VaDeriveImage
is only useful on surfaces backed with contiguous planes.
Implementation inspired from cgit.freedesktop.org/vaapi/intel-driver

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
8 years agost/va: add more errors checks in vlVaBufferSetNumElements and vlVaMapBuffer
Julien Isorce [Fri, 30 Oct 2015 11:42:51 +0000 (11:42 +0000)]
st/va: add more errors checks in vlVaBufferSetNumElements and vlVaMapBuffer

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
8 years agost/va: add headless support, i.e. VA_DISPLAY_DRM
Julien Isorce [Fri, 30 Oct 2015 11:42:50 +0000 (11:42 +0000)]
st/va: add headless support, i.e. VA_DISPLAY_DRM

This patch allows to use gallium vaapi without requiring
a X server running for your second graphic card.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
8 years agost/va: handle Video Post Processing for configs
Julien Isorce [Fri, 30 Oct 2015 11:42:49 +0000 (11:42 +0000)]
st/va: handle Video Post Processing for configs

Add support for VA_PROFILE_NONE and VAEntrypointVideoProc
in the 4 following functions:

vlVaQueryConfigProfiles
vlVaQueryConfigEntrypoints
vlVaCreateConfig
vlVaQueryConfigAttributes

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
8 years agost/va: add colospace conversion through Video Post Processing
Julien Isorce [Fri, 30 Oct 2015 11:42:48 +0000 (11:42 +0000)]
st/va: add colospace conversion through Video Post Processing

Add support for VPP in the following functions:
vlVaCreateContext
vlVaDestroyContext
vlVaBeginPicture
vlVaRenderPicture
vlVaEndPicture

Add support for VAProcFilterNone in:
vlVaQueryVideoProcFilters
vlVaQueryVideoProcFilterCaps
vlVaQueryVideoProcPipelineCaps

Add handleVAProcPipelineParameterBufferType helper.

One application is:
VASurfaceNV12 -> gstvaapipostproc -> VASurfaceRGBA

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
8 years agost/va: implement dmabuf import for VaCreateSurfaces2
Julien Isorce [Fri, 30 Oct 2015 11:42:47 +0000 (11:42 +0000)]
st/va: implement dmabuf import for VaCreateSurfaces2

For now it is limited to RGBA, BGRA, RGBX, BGRX surfaces.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
8 years agost/va: implement VaCreateSurfaces2 and VaQuerySurfaceAttributes
Julien Isorce [Fri, 30 Oct 2015 11:42:46 +0000 (11:42 +0000)]
st/va: implement VaCreateSurfaces2 and VaQuerySurfaceAttributes

Inspired from http://cgit.freedesktop.org/vaapi/intel-driver/
especially src/i965_drv_video.c::i965_CreateSurfaces2.

This patch is mainly to support gstreamer-vaapi and tools that uses
this newer libva API. The first advantage of using VaCreateSurfaces2
over existing VaCreateSurfaces, is that it is possible to select which
the pixel format for the surface. Indeed with the simple VaCreateSurfaces
function it is only possible to create a NV12 surface. It can be useful
to create a RGBA surface to use with video post processing.

The avaible pixel formats can be query with VaQuerySurfaceAttributes.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
8 years agost/va: do not destroy old buffer when new one failed
Julien Isorce [Fri, 30 Oct 2015 11:42:45 +0000 (11:42 +0000)]
st/va: do not destroy old buffer when new one failed

If formats are not the same vlVaPutImage re-creates the video
buffer with the right format. But if the creation of this new
video buffer fails then the surface looses its current buffer.
Let's just destroy the previous buffer on success.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
8 years agost/va: properly defines VAImageFormat formats and improve VaCreateImage
Julien Isorce [Fri, 30 Oct 2015 11:42:44 +0000 (11:42 +0000)]
st/va: properly defines VAImageFormat formats and improve VaCreateImage

Added PIPE_VIDEO_CHROMA_FORMAT_NONE in p_format.h
and return it by default in ChromaToPipe.

Renamed YCbCrToPipe to VaFourccToPipeFormat because it now
contains RGB.

Implemented PipeFormatToVaFourcc which will be used later in
VlVaDeriveImage.

Note that gstreamer-vaapi check all the VAImageFormat fields.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
8 years agomain: fix basename match's check if it's an array or struct
Samuel Iglesias Gonsalvez [Tue, 27 Oct 2015 13:21:12 +0000 (14:21 +0100)]
main: fix basename match's check if it's an array or struct

Commit 4565b6f did not update the basename match's check for
the case that string would exactly match the name of the
variable if the suffix "[0]" were appended to it.

Fixes two dEQP-GLES31 tests:

dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array
dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array_single_element

v2:
- Change the position of rname_has_array_index_zero to avoid an out-of-bounds
  read. Reported by Tapani Pälli.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
8 years agoi965: Fix invalid memory accesses after resizing brw_codegen's store table
Kristian Høgsberg [Wed, 28 Oct 2015 17:58:09 +0000 (10:58 -0700)]
i965: Fix invalid memory accesses after resizing brw_codegen's store table

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
8 years agoi965/sched: use liveness analysis for computing register pressure
Connor Abbott [Tue, 9 Jun 2015 17:26:53 +0000 (10:26 -0700)]
i965/sched: use liveness analysis for computing register pressure

Previously, we were using some heuristics to try and detect when a write
was about to begin a live range, or when a read was about to end a live
range. We never used the liveness analysis information used by the
register allocator, though, which meant that the scheduler's and the
allocator's ideas of when a live range began and ended were different.
Not only did this make our estimate of the register pressure benefit of
scheduling an instruction wrong in some cases, but it was preventing us
from knowing the actual register pressure when scheduling each
instruction, which we want to have in order to switch to register
pressure scheduling only when the register pressure is too high.

This commit rewrites the register pressure tracking code to use the same
model as our register allocator currently uses. We use the results of
liveness analysis, as well as the compute_payload_ranges() function that
we split out in the last commit. This means that we compute live ranges
twice on each round through the register allocator, although we could
speed it up by only recomputing the ranges and not the live in/live out
sets after scheduling, since we only shuffle around instructions within
a single basic block when we schedule.

Shader-db results on bdw:

total instructions in shared programs: 7130187 -> 7129880 (-0.00%)
instructions in affected programs: 1744 -> 1437 (-17.60%)
helped: 1
HURT: 1

total cycles in shared programs: 172535126 -> 172473226 (-0.04%)
cycles in affected programs: 11338636 -> 11276736 (-0.55%)
helped: 876
HURT: 873

LOST:   8
GAINED: 0

v2: use regs_read() in more places.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
8 years agoi965/fs: split out calculation of payload live ranges
Connor Abbott [Fri, 12 Jun 2015 19:01:35 +0000 (12:01 -0700)]
i965/fs: split out calculation of payload live ranges

We'll need this for the scheduler too, since it wants to know when the
live ranges of payload registers end in order to model them in our
register pressure calculations.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
8 years agoi965: dump scheduling cycle estimates
Connor Abbott [Sat, 6 Jun 2015 14:55:21 +0000 (10:55 -0400)]
i965: dump scheduling cycle estimates

The heuristic we're using is rather lame, since it assumes everything is
non-uniform and loops execute 10 times, but it should be enough for
measuring improvements in the scheduler that don't result in a change in
the number of instructions.

v2:
- Switch loops and cycle counts to be compatible with older shader-db.
- Make loop heuristic 10x to match with spilling code.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
8 years agoi965: always run the post-RA scheduler
Connor Abbott [Sat, 6 Jun 2015 17:32:21 +0000 (13:32 -0400)]
i965: always run the post-RA scheduler

Before, we would only do scheduling after register allocation if we
spilled, despite the fact that the pre-RA scheduler was only supposed to
be for register pressure and set the latencies of every instruction to
1. This meant that unless we spilled, which we rarely do, then we never
considered instruction latencies at all, and we usually never bothered
to try and hide texture fetch latency. Although a later commit removes
the setting the latency to 1 part, we still want to always run the
post-RA scheduler since it's able to take the false dependencies that
the register allocator creates into account, and it can be more
aggressive than the pre-RA scheduler since it doesn't have to worry
about register pressure at all.

Test                   master      post-ra-sched     diff       %diff
bench_OglPSBump2       396.730     402.386           5.656      +1.400%
bench_OglPSBump8       244.370     247.591           3.221      +1.300%
bench_OglPSPhong       241.117     242.002           0.885      +0.300%
bench_OglPSPom         59.555      59.725            0.170      +0.200%
bench_OglShMapPcf      86.149      102.346           16.197     +18.800%
bench_OglVSTangent     388.849     395.489           6.640      +1.700%
bench_trex             65.471      65.862            0.390      +0.500%
bench_trexoff          69.562      70.150            0.588      +0.800%
bench_heaven           25.179      25.254            0.074      +0.200%

Reviewed-by: Jason Ekstrand <jasoan.ekstrand@intel.com>
8 years agoi965/sched: write-after-read dependencies are free
Connor Abbott [Sun, 7 Jun 2015 04:37:27 +0000 (00:37 -0400)]
i965/sched: write-after-read dependencies are free

Although write-after-write dependencies have the same latency as
read-after-write dependencies due to how the register scoreboard works,
write-after-read dependencies aren't checked by the EU at all, so
they're purely a constraint on how the scheduler can order the
instructions.

v2: fix accumulator dependencies too.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
8 years agoi965: fix cycle estimates when there's a pipeline stall
Connor Abbott [Fri, 5 Jun 2015 23:20:57 +0000 (19:20 -0400)]
i965: fix cycle estimates when there's a pipeline stall

The issue time for an instruction is how many cycles it takes to
actually put it into the pipeline. If there's a pipeline stall that
causes the instruction to be delayed, we should first take that into
account to figure out when the instruction would start executing and
*then* add the issue time. The old code had it backwards, and so we
would underestimate the total time whenever we thought there would be a
pipeline stall by up to the issue time of the instruction.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
8 years agovc4: Allow user index buffers, to avoid slow readback for shadow IBs.
Eric Anholt [Tue, 28 Jul 2015 18:35:03 +0000 (11:35 -0700)]
vc4: Allow user index buffers, to avoid slow readback for shadow IBs.

Improves low-settings openarena performance by 31.9975% +/- 0.659931%
(n=7).

8 years agonv50: mark contexts shareable, compile at creation time
Ilia Mirkin [Fri, 30 Oct 2015 03:25:08 +0000 (23:25 -0400)]
nv50: mark contexts shareable, compile at creation time

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agonv50: allow per-sample interpolation to be forced via rast
Ilia Mirkin [Fri, 30 Oct 2015 02:18:25 +0000 (22:18 -0400)]
nv50: allow per-sample interpolation to be forced via rast

Uses the same technique as for nvc0 of fixups before upload, and
evicting in case of state change. Removes one source of variants kept by
st/mesa.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agoi965: Add INTEL_DEBUG=nocompact to disable instruction compaction.
Matt Turner [Thu, 29 Oct 2015 23:08:45 +0000 (16:08 -0700)]
i965: Add INTEL_DEBUG=nocompact to disable instruction compaction.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Add INTEL_DEBUG=hex to print the hex with the disassembly.
Matt Turner [Mon, 26 Oct 2015 02:05:56 +0000 (19:05 -0700)]
i965: Add INTEL_DEBUG=hex to print the hex with the disassembly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Print the type and writemask on null destinations.
Matt Turner [Mon, 26 Oct 2015 02:16:39 +0000 (19:16 -0700)]
i965: Print the type and writemask on null destinations.

These are often useful in debugging, and the writemask (actually
"Channel Enables") determines more than just what goes into the
destination.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Test fixed_hw_reg.file against BRW_IMMEDIATE_VALUE, not IMM.
Matt Turner [Mon, 26 Oct 2015 11:09:35 +0000 (04:09 -0700)]
i965: Test fixed_hw_reg.file against BRW_IMMEDIATE_VALUE, not IMM.

No functional change, since they were both 3, but BRW_IMMEDIATE_VALUE is
the hardware value and IMM was the IR value -- and you can see that
BRW_IMMEDIATE_VALUE was correctly used in the context of this patch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/vec4: Test against BRW_IMMEDIATE_VALUE, not IMM.
Matt Turner [Thu, 29 Oct 2015 04:19:52 +0000 (21:19 -0700)]
i965/vec4: Test against BRW_IMMEDIATE_VALUE, not IMM.

No functional change, since they were both 3, but BRW_IMMEDIATE_VALUE is
the hardware value and IMM was the IR value -- and you can see that
BRW_IMMEDIATE_VALUE was correctly used in the context of this patch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Use group(4, 0) to emit an exec-size 4 MOV.
Matt Turner [Thu, 29 Oct 2015 17:29:55 +0000 (10:29 -0700)]
i965/fs: Use group(4, 0) to emit an exec-size 4 MOV.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/cfg: Handle no-idom case in cfg_t::dump_domtree().
Matt Turner [Thu, 29 Oct 2015 04:11:46 +0000 (21:11 -0700)]
i965/cfg: Handle no-idom case in cfg_t::dump_domtree().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/disasm: Remove unused _addr_mode argument from src_ia1().
Matt Turner [Mon, 26 Oct 2015 00:44:59 +0000 (17:44 -0700)]
i965/disasm: Remove unused _addr_mode argument from src_ia1().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Set correct field for indirect align16 addrimm.
Matt Turner [Mon, 26 Oct 2015 00:20:54 +0000 (17:20 -0700)]
i965: Set correct field for indirect align16 addrimm.

This has been wrong since the initial import of the i965 driver.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/vec4: Drop brw_set_default_* before popping insn state.
Matt Turner [Sat, 24 Oct 2015 06:15:03 +0000 (23:15 -0700)]
i965/vec4: Drop brw_set_default_* before popping insn state.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/vec4: Remove unnecessary #includes from the generator.
Matt Turner [Sat, 24 Oct 2015 06:13:07 +0000 (23:13 -0700)]
i965/vec4: Remove unnecessary #includes from the generator.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
8 years agor600: enable SB for geom shaders on pre-evergreen
Dave Airlie [Fri, 30 Oct 2015 00:39:13 +0000 (10:39 +1000)]
r600: enable SB for geom shaders on pre-evergreen

I've checked with piglit and one tests fails, but it fails
on evergreen as well, so will get fixed later.

Otherwise SB seems to be working fine for geom shaders on my
rv635.

Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agoi965/vec4: Eliminate the vec4_generator class altogether.
Kenneth Graunke [Thu, 22 Oct 2015 22:35:15 +0000 (15:35 -0700)]
i965/vec4: Eliminate the vec4_generator class altogether.

We really weren't taking advantage of vec4_generator being a class.
By adding a "p" parameter to the helper methods, and "prog_data" to
ones which need binding table information, we can convert everything
to static functions.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965/vec4: Move vec4_generator class definition into the .cpp file.
Kenneth Graunke [Thu, 22 Oct 2015 22:04:52 +0000 (15:04 -0700)]
i965/vec4: Move vec4_generator class definition into the .cpp file.

The public API for the generator is brw_vec4_generate_code(); nobody
actually needs to use the class.  This means we can extend it without
triggering the recompiles associated with altering brw_vec4.h.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965/vec4: Wrap vec4_generator in a C function.
Kenneth Graunke [Thu, 22 Oct 2015 22:01:27 +0000 (15:01 -0700)]
i965/vec4: Wrap vec4_generator in a C function.

vec4_generator is a class for convenience, but only exports a single
method as its public API.  It makes much more sense to just export a
single function.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965/vec4: Convert src_reg/dst_reg to brw_reg at the end of the visitor.
Kenneth Graunke [Thu, 22 Oct 2015 23:04:15 +0000 (16:04 -0700)]
i965/vec4: Convert src_reg/dst_reg to brw_reg at the end of the visitor.

This patch makes the visitor convert registers to the HW_REG file at the
very end, after register allocation, post-RA scheduling, and dependency
control flagging.  After that, everything is in fixed brw_regs.

This simplifies the code generator, as it can just use the hardware
registers rather than having to interpret our abstract files.  In
particular, interpreting the UNIFORM file meant reading prog_data
to figure out where push constants are supposed to start.

Having the part of the code that performs register allocation also
translate everything to hardware registers seems sensible.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agor600g: Fix special negative immediate constants when using ABS modifier.
Ivan Kalvachev [Sat, 24 Oct 2015 22:16:58 +0000 (01:16 +0300)]
r600g: Fix special negative immediate constants when using ABS modifier.

Some constants (like 1.0 and 0.5) could be inlined as immediate inputs
without using their literal value. The r600_bytecode_special_constants()
function emulates the negative of these constants by using NEG modifier.

However some shaders define -1.0 constant and want to use it as 1.0.
They do so by using ABS modifier. But r600_bytecode_special_constants()
set NEG in addition to ABS. Since NEG modifier have priority over ABS one,
we get -|1.0| as result, instead of |1.0|.

The patch simply prevents the additional switching of NEG when ABS is set.

[According to Ivan Kalvachev, this bug was fond via
https://github.com/iXit/Mesa-3D/issues/126 and
https://github.com/iXit/Mesa-3D/issues/127]

Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
CC: <mesa-stable@lists.freedesktop.org>
8 years agost/mesa: fix mipmap generation for immutable textures with incomplete pyramids
Nicolai Hähnle [Thu, 22 Oct 2015 23:06:15 +0000 (01:06 +0200)]
st/mesa: fix mipmap generation for immutable textures with incomplete pyramids

Without the clamping by NumLevels, the state tracker would reallocate the
texture storage (incorrect) and even fail to copy the base level image
after reallocation, leading to the graphical glitch of
https://bugs.freedesktop.org/show_bug.cgi?id=91993 .

A piglit test has been submitted for review as well (subtest of
arb_texture_storage-texture-storage).

v2: also bypass all calls to st_finalize_texture (suggested by Marek Olšák)

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agomesa: Enable ASTC in GLES' [NUM_]COMPRESSED_TEXTURE_FORMATS queries
Nanley Chery [Wed, 14 Oct 2015 21:32:43 +0000 (14:32 -0700)]
mesa: Enable ASTC in GLES' [NUM_]COMPRESSED_TEXTURE_FORMATS queries

In OpenGL ES, the COMPRESSED_TEXTURE_FORMATS query returns the set of
supported specific compressed formats. Since ASTC formats fit within
that category, include them in the set and update the
NUM_COMPRESSED_TEXTURE_FORMATS query as well.

This enables GLES2-based ASTC dEQP tests to run. See the Bugzilla for
more info.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92193
Reported-by: Tapani Pälli <tapani.palli@intel.com>
Suggested-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agomesa/texcompress: Restrict FXT1 format to desktop GL subset
Nanley Chery [Wed, 14 Oct 2015 04:05:07 +0000 (21:05 -0700)]
mesa/texcompress: Restrict FXT1 format to desktop GL subset

In agreement with the extension spec and commit
dd0eb004874645135b9aaac3ebbd0aaf274079ea, filter FXT1 formats to the
desktop GL profiles. Now we no longer advertise such formats as supported
in an ES context and then throw an INVALID_ENUM error when the client
tries to use such formats with CompressedTexImage2D.

Fixes the following 26 dEQP tests:
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_neg_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_neg_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_neg_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_pos_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_pos_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_pos_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_size
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_level_max_cube_pos
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_level_max_tex2d
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_level_cube
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_level_tex2d
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_neg_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_neg_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_neg_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_pos_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_pos_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_pos_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_tex2d
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_neg_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_neg_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_neg_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_pos_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_pos_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_pos_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_tex2d

v2. Use _mesa_is_desktop_gl() (Ilia, Ian)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
8 years agonvc0: expose a group of performance metrics on Fermi
Samuel Pitoiset [Wed, 28 Oct 2015 10:20:36 +0000 (11:20 +0100)]
nvc0: expose a group of performance metrics on Fermi

This allows to monitor those performance metrics through
GL_AMD_performance_monitor.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
8 years agost/mesa: create temporary textures with the same nr_samples as source
Ilia Mirkin [Wed, 28 Oct 2015 19:38:53 +0000 (15:38 -0400)]
st/mesa: create temporary textures with the same nr_samples as source

Not sure if this is actually reachable in practice (to have a complex
copy with MS textures).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoglsl: add fragdata arrays to program resource list
Tapani Pälli [Tue, 27 Oct 2015 11:18:42 +0000 (13:18 +0200)]
glsl: add fragdata arrays to program resource list

This makes sure that user is still able to query properties about
variables that have gotten removed by opt_dead_builtin_varyings pass.

Fixes following OpenGL ES 3.1 test:
   ES31-CTS.program_interface_query.output-layout

No Piglit regressions.

v2: cleanup, drop extra parenthesis (Topi)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
8 years agomesa: add fragdata_arrays list to gl_shader
Tapani Pälli [Tue, 27 Oct 2015 11:18:41 +0000 (13:18 +0200)]
mesa: add fragdata_arrays list to gl_shader

This is required to store information about fragdata arrays, currently
these variables get lost and cannot be retrieved later in sensible way
for program interface queries. List will be utilized by next patch.

Patch also modifies opt_dead_builtin_varyings pass to build list when
lowering fragdata arrays. This is identical approach as taken with
packed varyings pass.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
8 years agoglsl: fix GL_BUFFER_DATA_SIZE value for shader storage blocks with unsize arrays
Samuel Iglesias Gonsalvez [Thu, 22 Oct 2015 08:07:54 +0000 (10:07 +0200)]
glsl: fix GL_BUFFER_DATA_SIZE value for shader storage blocks with unsize arrays

From ARB_program_interface_query:

"For the property of BUFFER_DATA_SIZE, then the implementation-dependent
 minimum total buffer object size, in basic machine units, required to hold
 all active variables associated with an active uniform block, shader
 storage block, or atomic counter buffer is written to <params>.  If the
 final member of an active shader storage block is array with no declared
 size, the minimum buffer size is computed assuming the array was declared
 as an array with one element."

Fixes the following dEQP-GLES31 tests:

dEQP-GLES31.functional.program_interface_query.shader_storage_block.buffer_data_size.named_block
dEQP-GLES31.functional.program_interface_query.shader_storage_block.buffer_data_size.unnamed_block
dEQP-GLES31.functional.program_interface_query.shader_storage_block.buffer_data_size.block_array

v2:
- Fix comment's indentation and explain that the parser already
  checked that unsized array is in last element of a shader
  storage block (Iago).
- Add assert (Iago).

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
8 years agodocs: Mark GL_ARB_fragment_layer_viewport as done on i965.
Kenneth Graunke [Thu, 29 Oct 2015 05:02:39 +0000 (22:02 -0700)]
docs: Mark GL_ARB_fragment_layer_viewport as done on i965.

8 years agoi965: Implement ARB_fragment_layer_viewport.
Kenneth Graunke [Wed, 17 Jun 2015 20:06:18 +0000 (13:06 -0700)]
i965: Implement ARB_fragment_layer_viewport.

Normally, we could read gl_Layer from bits 26:16 of R0.0.  However, the
specification requires that bogus out-of-range 32-bit values written by
previous stages need to appear in the fragment shader as-written.

Instead, we pass in the full 32-bit value from the VUE header as an
extra flat-shaded varying.  We have the SF override the value to 0
when the previous stage didn't actually write a value (it's actually
defined to return 0).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
8 years agoi965: Make calculate_attr_overrides return the URB read offset.
Kenneth Graunke [Mon, 26 Oct 2015 07:52:14 +0000 (00:52 -0700)]
i965: Make calculate_attr_overrides return the URB read offset.

Traditionally, we've hardcoded "URB Entry Read Offset" to 1 (which
represents 2 vec4 varying slots) to skip over the 8 DWord VUE header.

In order to support ARB_fragment_layer_viewport, we'll need to read
from that header.  This patch adds the basic plumbing necessary to
calculate a value dynamically and hook it up in the SBE packets.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
8 years agoglsl: Mark gl_ViewportIndex and gl_Layer varyings as flat.
Kenneth Graunke [Mon, 26 Oct 2015 07:14:13 +0000 (00:14 -0700)]
glsl: Mark gl_ViewportIndex and gl_Layer varyings as flat.

Integer varyings need to be flat qualified - all others were already.
I think we just missed this.  Presumably some hardware passes this via
sideband and ignores attribute interpolation, so no one has noticed.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
8 years agoi965/fs: Properly check for PAD in fragment shaders with > 16 varyings.
Kenneth Graunke [Mon, 26 Oct 2015 08:03:12 +0000 (01:03 -0700)]
i965/fs: Properly check for PAD in fragment shaders with > 16 varyings.

Commit 268008f98c3810b9f276df985dc93efc0c49f33e changed unused VUE map
slots to be initialized with BRW_VARYING_SLOT_PAD, not COUNT.  I missed
updating this.  It also means that commit message was wrong, as some
code *did* rely slots being initialized to COUNT.

This may fix a bug with SSO programs with > 16 FS input varyings.
I think we probably just emitted extra pointless code, but probably
didn't break anything.  We might also just have no tests for that.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
8 years agoi965: Update stale comment about unused VUE map slots.
Kenneth Graunke [Mon, 26 Oct 2015 08:02:18 +0000 (01:02 -0700)]
i965: Update stale comment about unused VUE map slots.

I changed this from COUNT to PAD in commit 268008f98c3810b9f276df985dc93ef.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
8 years agonv50/ir: adapt to new method for passing in cull/clip distance masks
Ilia Mirkin [Sat, 24 Oct 2015 03:25:33 +0000 (23:25 -0400)]
nv50/ir: adapt to new method for passing in cull/clip distance masks

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agonvc0: share shaders between contexts and build immediately
Ilia Mirkin [Tue, 20 Oct 2015 22:50:54 +0000 (18:50 -0400)]
nvc0: share shaders between contexts and build immediately

Avoid deferring building shaders until draw time, should hopefully
reduce any stuttering, as well as enable shader-db style analysis.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agonvc0: do upload-time fixups for interpolation parameters
Ilia Mirkin [Tue, 20 Oct 2015 22:03:40 +0000 (18:03 -0400)]
nvc0: do upload-time fixups for interpolation parameters

Unfortunately flatshading is an all-or-nothing proposition on nvc0,
while GL 3.0 calls for the ability to selectively specify explicit
interpolation parameters on gl_Color/gl_SecondaryColor which would
override the flatshading setting. This allows us to fix up the
interpolation settings after shader generation based on rasterizer
settings.

While we're at it, we can add support for dynamically forcing all
(non-flat) shader inputs to be interpolated per-sample, which allows
st/mesa to not generate variants for these.

Fixes the remaining failing glsl-1.30/execution/interpolation piglits.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agonir: Copy "patch" flag from ir_variable to nir_variable.
Kenneth Graunke [Fri, 2 Oct 2015 07:01:23 +0000 (00:01 -0700)]
nir: Copy "patch" flag from ir_variable to nir_variable.

This was introduced in GLSL IR after NIR development had branched.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
8 years agonir: Add intrinsics for tessellation shader system values.
Kenneth Graunke [Fri, 9 Oct 2015 06:53:47 +0000 (23:53 -0700)]
nir: Add intrinsics for tessellation shader system values.

nir_intrinsic_load_patch_vertices_in corresponds to gl_PatchVerticesIn,
a special input in both the TCS and TES stages.

nir_intrinsic_load_tess_coord corresponds to gl_TessCoord, a special
tessellation evaluation shader input.

nir_intrinsic_load_tess_level_outer/inner correspond to the
gl_TessLevelOuter[] and gl_TessLevelInner[] evaluation shader inputs,
which we treat as system values because they're stored specially.
(These intrinsics are only for the TES - the TCS uses output variables.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
8 years agoi965: Fix missing BRW_NEW_*_PROG_DATA flagging caused by cache reuse.
Kenneth Graunke [Wed, 28 Oct 2015 07:53:20 +0000 (00:53 -0700)]
i965: Fix missing BRW_NEW_*_PROG_DATA flagging caused by cache reuse.

Consider the case of two nearly identical GLSL fragment shaders:

   out vec4 color;
   void main() { color = vec4(1); }

and

   layout(early_fragment_tests) in;
   out vec4 color;
   void main() { color = vec4(1); }

These shaders compile to the exact same assembly, but have distinct
values for brw_wm_prog_data::early_fragment_tests.

Since these are two independent GLSL shaders, they have different
program keys - notably, brw_wm_prog_key::program_string_id differs.

When uploading the second, brw_upload_cache will find an existing copy
of the assembly in the cache BO, which means matching_data will be
non-NULL.  Although we create a second cache item (with the new key
and prog_data), we set item->offset to the existing copy and avoid
re-uploading duplicate assembly.

However, brw_search_cache() would only flag BRW_NEW_*_PROG_DATA if
item->offset differed from the supplied offset.  With reuse, both
programs have the same offset, but prog_data changed.  We have to
flag it, but failed to.

To fix this, we simply need to check if the aux (prog_data) pointer
changed.  If either the assembly or the prog_data differs, flag it.

This fixes a regression since 1bba29ed403e735ba0bf04ed8aa2e571884f,
where Topi fixed brw_upload_cache() to actually reuse identical
assembly.  Prior to that, reuse basically never happened due to bugs.
Unfortunately, this code apparently wasn't prepared to handle reuse!

Fixes GPU hangs in Dolphin on Broadwell.

Huge thanks to Pierre Bourdon and Ilia Mirkin for debugging this
and helping track down the real issue.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92623
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Pierre Bourdon <delroth@gmail.com>
8 years agoclover: fix building fix clang-3.8
Laurent Carlier [Wed, 28 Oct 2015 14:47:09 +0000 (15:47 +0100)]
clover: fix building fix clang-3.8

https://bugs.freedesktop.org/show_bug.cgi?id=92705

v2.1: use Linker::Flags::None instead of 0 and emplace_back()

Signed-off-by: Laurent Carlier <lordheavym@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agonv50: add ARB_copy_image support
Ilia Mirkin [Thu, 29 Oct 2015 00:52:50 +0000 (20:52 -0400)]
nv50: add ARB_copy_image support

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agonvc0: add ARB_copy_image support
Ilia Mirkin [Wed, 28 Oct 2015 20:18:18 +0000 (16:18 -0400)]
nvc0: add ARB_copy_image support

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agonvc0: fix crash when nv50_miptree_from_handle fails
Julien Isorce [Tue, 20 Oct 2015 16:34:23 +0000 (17:34 +0100)]
nvc0: fix crash when nv50_miptree_from_handle fails

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
8 years agovbo: replace assertion with conditional in vbo_compute_max_verts()
Brian Paul [Tue, 27 Oct 2015 19:50:10 +0000 (13:50 -0600)]
vbo: replace assertion with conditional in vbo_compute_max_verts()

With just the right sequence of per-vertex commands and state changes,
it's possible for this assertion to fail (such as with viewperf11's
lightwave-06-1 test).  Instead of asserting, return 0 so that the
caller knows the VBO is full and needs to be flushed.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
8 years agomesa: minor formatting fix in get_tex_rgba_compressed()
Brian Paul [Wed, 28 Oct 2015 17:03:21 +0000 (11:03 -0600)]
mesa: minor formatting fix in get_tex_rgba_compressed()

8 years agost/mesa: implement ARB_copy_image
Marek Olšák [Mon, 24 Aug 2015 00:55:20 +0000 (02:55 +0200)]
st/mesa: implement ARB_copy_image

I wonder if the craziness was worth it.

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agogallium: add PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS
Marek Olšák [Sun, 23 Aug 2015 23:19:35 +0000 (01:19 +0200)]
gallium: add PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS

For ARB_copy_image.

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agoradeonsi: allow copying between compatible compressed and uncompressed formats
Marek Olšák [Sun, 23 Aug 2015 23:08:48 +0000 (01:08 +0200)]
radeonsi: allow copying between compatible compressed and uncompressed formats

which is where a block in src maps to a pixel in dst and vice versa.
e.g. DXT1 <-> R32G32_UINT
     DXT5 <-> R32G32B32A32_UINT

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
8 years agomesa: set TargetIndex in VDPAURegister*SurfaceNV (v2)
Marek Olšák [Tue, 27 Oct 2015 10:11:19 +0000 (11:11 +0100)]
mesa: set TargetIndex in VDPAURegister*SurfaceNV (v2)

We initialized Target, but not TargetIndex.
This is required since 7d7dd1871174905dfdd3ca874a09d9.

v2: do it in the right place. Noticed by Brian Paul.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92645

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agoi965: remove unneeded src_reg copy in emit_shader_time_write
Emil Velikov [Wed, 7 Oct 2015 11:38:12 +0000 (12:38 +0100)]
i965: remove unneeded src_reg copy in emit_shader_time_write

The variable is already of type src_reg. creating a new instance only to
destroy it seems unnecessary.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: remove cache_aux_free_func array
Emil Velikov [Wed, 7 Oct 2015 11:38:11 +0000 (12:38 +0100)]
i965: remove cache_aux_free_func array

There is only one function that can be called, which is well known at
compilation time.

The abstraction used here seems unnecessary, so let's use a direct call
to brw_stage_prog_data_free() when appropriate, cut down the size of
struct brw_cache.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agomain: fix GL_MAX_NUM_ACTIVE_VARIABLES value for shader storage blocks
Samuel Iglesias Gonsalvez [Tue, 27 Oct 2015 08:33:01 +0000 (09:33 +0100)]
main: fix GL_MAX_NUM_ACTIVE_VARIABLES value for shader storage blocks

The maximum number of active variables for shader storage blocks should
take into account the specific rules for shader storage blocks, i.e. for
an active shader storage block member declared as an array, an entry
will be generated only for the first array element, regardless of its type.

Fixes 3 dEQP-GLES31.functional.* tests:

dEQP-GLES31.functional.program_interface_query.shader_storage_block.active_variables.named_block
dEQP-GLES31.functional.program_interface_query.shader_storage_block.active_variables.unnamed_block
dEQP-GLES31.functional.program_interface_query.shader_storage_block.active_variables.block_array

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
8 years agost/vdpau: disable RefPicList for Vdpau HEVC
Boyuan Zhang [Fri, 23 Oct 2015 17:44:23 +0000 (13:44 -0400)]
st/vdpau: disable RefPicList for Vdpau HEVC

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
8 years agost/va: add VAAPI HEVC decode support
Boyuan Zhang [Fri, 23 Oct 2015 17:37:48 +0000 (13:37 -0400)]
st/va: add VAAPI HEVC decode support

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
8 years agoradeon/uvd: implement and add flag for VAAPI HEVC decode
Boyuan Zhang [Fri, 23 Oct 2015 16:30:33 +0000 (12:30 -0400)]
radeon/uvd: implement and add flag for VAAPI HEVC decode

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>