mesa.git
10 years agoi965/cse: Don't eliminate instructions with side-effects
Jason Ekstrand [Fri, 8 Aug 2014 21:30:25 +0000 (14:30 -0700)]
i965/cse: Don't eliminate instructions with side-effects

This casues problems when converting atomics to use the GRF.  Sometimes the atomic operation would get eaten by CSE when it shouldn't.

v2: Roll the has_side_effects check into is_expression

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agodocs/GL3: Mark ARB_copy_image as implemented on i965
Jason Ekstrand [Mon, 4 Aug 2014 22:17:15 +0000 (15:17 -0700)]
docs/GL3: Mark ARB_copy_image as implemented on i965

10 years agoi965: Add support for ARB_copy_image
Jason Ekstrand [Fri, 27 Jun 2014 23:05:37 +0000 (16:05 -0700)]
i965: Add support for ARB_copy_image

This, together with the meta path, provides a complete implemetation of
ARB_copy_image.

v2: Add a fallback memcpy path for when the texture is too big for the
    blitter
v3: Properly support copying between two places on the same texture in the
    memcpy fallback
v4: Properly handle blit between the same two images in the fallback path
v5: Properly handle blit between the same two compressed images in the
    fallback path
v6: Fix a typo in a comment

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
10 years agomesa/meta: Add a partial implementation of CopyImageSubData
Jason Ekstrand [Fri, 25 Jul 2014 21:08:59 +0000 (14:08 -0700)]
mesa/meta: Add a partial implementation of CopyImageSubData

This provides an implementation of CopyImageSubData that works if both
textures are uncompressed.  This implementation works by using a
combination of texture views and BlitFramebuffer.  If one of the textures
is compressed, it returns false and the driver is expected to provide a
fallback.

v2: Don't leak fbo's

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
v3: Change glGen/DeleteTextures to _mesa_Gen/DeleteTextures

10 years agomesa/meta: Make _mesa_meta_bind_fbo_image also take a framebuffer target
Jason Ekstrand [Fri, 25 Jul 2014 21:07:49 +0000 (14:07 -0700)]
mesa/meta: Make _mesa_meta_bind_fbo_image also take a framebuffer target

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
10 years agomesa: Add GL API support for ARB_copy_image
Jason Ekstrand [Fri, 27 Jun 2014 22:34:53 +0000 (15:34 -0700)]
mesa: Add GL API support for ARB_copy_image

This adds the API entrypoint, error checking logic, and a driver hook for
the ARB_copy_image extension.

v2: Fix a typo in ARB_copy_image.xml and add it to the makefile
v3: Put ARB_copy_image.xml in the right place alphebetically in the
    makefile and properly prefix the commit message
v4: Fixed some line wrapping and added a check for null
v5: Check for incomplete renderbuffers

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
v6: Update dispatch_sanity for the addition of CopyImageSubData

10 years agoi965/fs: Keep track of the register that hold delta_x/delta_y.
Matt Turner [Mon, 11 Aug 2014 02:03:34 +0000 (19:03 -0700)]
i965/fs: Keep track of the register that hold delta_x/delta_y.

They're needed in register allocation. Fixes a regression since
afe3d155.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78875
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
10 years agoi965: Mark branch unreachable in sampler state code.
Matt Turner [Mon, 11 Aug 2014 04:32:24 +0000 (21:32 -0700)]
i965: Mark branch unreachable in sampler state code.

Silences some uninitialized variable warnings.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
10 years agomesa: simplify _mesa_update_draw_buffers()
Brian Paul [Fri, 8 Aug 2014 21:10:31 +0000 (15:10 -0600)]
mesa: simplify _mesa_update_draw_buffers()

There's no need to copy the array of DrawBuffer enums to a temp array.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
10 years agomesa: fix assertion in _mesa_drawbuffers()
Brian Paul [Fri, 8 Aug 2014 21:01:50 +0000 (15:01 -0600)]
mesa: fix assertion in _mesa_drawbuffers()

Fixes failed assertion when _mesa_update_draw_buffers() was called
with GL_DRAW_BUFFER == GL_FRONT_AND_BACK.  The piglit gl30basic hit
this.

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
10 years agomesa: whitespace, 80-column wrapping in program.c
Brian Paul [Fri, 8 Aug 2014 19:22:28 +0000 (13:22 -0600)]
mesa: whitespace, 80-column wrapping in program.c

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
10 years agomesa: simplify/rename _mesa_init_program_struct()
Brian Paul [Fri, 8 Aug 2014 19:19:49 +0000 (13:19 -0600)]
mesa: simplify/rename _mesa_init_program_struct()

No need to return a value.  Remove unused ctx parameter.  Remove
_mesa_ prefix since it's static.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
10 years agost/mesa: use PRId64 for printing 64-bit ints
Brian Paul [Fri, 8 Aug 2014 13:51:47 +0000 (07:51 -0600)]
st/mesa: use PRId64 for printing 64-bit ints

v2: use signed types/formats

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
10 years agomesa: use PRId64 for printing 64-bit ints
Brian Paul [Fri, 8 Aug 2014 13:49:33 +0000 (07:49 -0600)]
mesa: use PRId64 for printing 64-bit ints

Silences MinGW warnings:
 warning: unknown conversion type character ‘l’ in format [-Wformat]
 warning: too many arguments for format [-Wformat-extra-args]

v2: use signed types/formats

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
10 years agomesa: define and use ALL_TYPE_BITS in varray.c code
Brian Paul [Fri, 8 Aug 2014 13:46:45 +0000 (07:46 -0600)]
mesa: define and use ALL_TYPE_BITS in varray.c code

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
10 years agomesa: add comment that GL_CLIP_DISTANCE0 == GL_CLIP_PLANE0 in enable.c
Brian Paul [Fri, 8 Aug 2014 13:45:42 +0000 (07:45 -0600)]
mesa: add comment that GL_CLIP_DISTANCE0 == GL_CLIP_PLANE0 in enable.c

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
10 years agoconfigure.ac: Do not require llvm on x32
Maarten Lankhorst [Mon, 11 Aug 2014 11:16:05 +0000 (13:16 +0200)]
configure.ac: Do not require llvm on x32

Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Maarten Lankhorst <dev@mblankhorst.nl>
10 years agoi965: Don't check for format differences when using the blorp blitter
Neil Roberts [Tue, 1 Jul 2014 15:04:56 +0000 (16:04 +0100)]
i965: Don't check for format differences when using the blorp blitter

Previously the blorp blitter wouldn't be used if the source and destination
buffer had a different format other than swizzling between RGB and BGR and
adding or removing a dummy alpha channel. However there's no reason why the
blorp code path can't be used to do almost all format conversions so this
patch just removes the checks. However it does explicitly disable converting
to/from MESA_FORMAT_Z24_UNORM_X8_UINT because there is a similar check
brw_blorp_copytexsubimage.

This doesn't cause any Piglit test regressions at least on Ivybridge.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
10 years agoi965/eu: Allow math on immediates on Broadwell.
Kenneth Graunke [Fri, 11 Jul 2014 22:54:11 +0000 (15:54 -0700)]
i965/eu: Allow math on immediates on Broadwell.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agoi965/eu: Update jump distance scaling for Broadwell.
Kenneth Graunke [Thu, 3 Jul 2014 22:01:58 +0000 (15:01 -0700)]
i965/eu: Update jump distance scaling for Broadwell.

Broadwell measures jump distances in bytes, so we need to scale by 16.

v2: Update the function in brw_eu.h, not in brw_eu_emit.c.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agoi965/eu: Refactor jump distance scaling to use a helper function.
Kenneth Graunke [Mon, 30 Jun 2014 15:00:25 +0000 (08:00 -0700)]
i965/eu: Refactor jump distance scaling to use a helper function.

Different generations of hardware measure jump distances in different
units.  Previously, every function that needed to set a jump target open
coded this scaling, or made a hardcoded assumption (i.e. just used 2).

Most functions start with the number of instructions to jump, and scale
up to the hardware-specific value.  So, I made the function match that.

Others start with a byte offset, and divide by a constant (8) to obtain
the jump distance.  This is actually 16 / 2 (the jump scale for Gen5-7).

v2: Make the helper a static inline defined in brw_eu.h, instead of
    an actual function in brw_eu_emit.c (as suggested by Matt).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agoi965/eu: Set UIP on ELSE instructions on Broadwell.
Kenneth Graunke [Mon, 30 Jun 2014 15:05:42 +0000 (08:05 -0700)]
i965/eu: Set UIP on ELSE instructions on Broadwell.

Broadwell adds UIP on ELSE instructions.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agoi965/eu: Make it clear that brw_patch_break_count only runs on Gen4-5.
Kenneth Graunke [Mon, 30 Jun 2014 16:22:27 +0000 (09:22 -0700)]
i965/eu: Make it clear that brw_patch_break_count only runs on Gen4-5.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agoi965/eu: Make it clear that brw_find_loop_end only runs on Gen6+.
Kenneth Graunke [Mon, 30 Jun 2014 15:06:43 +0000 (08:06 -0700)]
i965/eu: Make it clear that brw_find_loop_end only runs on Gen6+.

It has Gen6+ knowledge baked in, and indeed is only called for Gen6+,
but it wasn't immediately obvious that this was the case.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agoi965/eu: Port Broadwell CMP destination type hack to brw_eu_emit.c.
Kenneth Graunke [Mon, 30 Jun 2014 14:51:51 +0000 (07:51 -0700)]
i965/eu: Port Broadwell CMP destination type hack to brw_eu_emit.c.

See gen8_generator::CMP().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agoi965/eu: Explicitly disable instruction compaction on Broadwell for now.
Kenneth Graunke [Sat, 28 Jun 2014 22:30:58 +0000 (15:30 -0700)]
i965/eu: Explicitly disable instruction compaction on Broadwell for now.

Until now, it's been off implicitly: we never call the compactor
function.  When we merge the generators, we'll start calling it, so we
should make it do nothing.

Matt will enable instruction compaction properly later.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agoi965/eu: Use Haswell atomic messages on Broadwell.
Kenneth Graunke [Fri, 11 Jul 2014 22:48:14 +0000 (15:48 -0700)]
i965/eu: Use Haswell atomic messages on Broadwell.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agoi965/eu: Change gen == 7 to gen >= 7 in a couple brw_eu_emit.c cases.
Kenneth Graunke [Mon, 30 Jun 2014 14:26:30 +0000 (07:26 -0700)]
i965/eu: Change gen == 7 to gen >= 7 in a couple brw_eu_emit.c cases.

Broadwell is going to use the brw_eu_emit.c code soon.  We want to get
the fake MRF handling and URB HWord channel mask handling.

We don't need the CMP thread switch workaround, though.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agoi965/clip: Removing scissor atom
Ben Widawsky [Wed, 30 Jul 2014 18:39:06 +0000 (11:39 -0700)]
i965/clip: Removing scissor atom

Now that we no longer use ctx->DrawBuffer->_Xmin and related fields to
program the screen-space viewport extents, we don't depend on any
scissoring state.  So we can drop the +_NEW_SCISSOR dependency.

On GEN8, a change in scissor state does not effect anything for the
clipper/sf hardware state. The hardware will always do the right thing
once the viewport extents are programmed. We can therefore remove the
unecessary state emission.

Ken originally spotted this.

v2: Reword the commit message. Remove spurious hunk.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
10 years agoi965/guardband: Enable for all viewport dimensions (GEN8+)
Ben Widawsky [Thu, 24 Jul 2014 00:55:40 +0000 (17:55 -0700)]
i965/guardband: Enable for all viewport dimensions (GEN8+)

The goal of guardband clipping is to try to avoid 3d clipping because it
is an expensive operation. When guardband clipping is disabled, all
geometry that intersects the viewport is sent to the FF 3d clipper.
Objects which are entirely enclosed within the viewport are said to be
"trivially accepted" while those entirely outside of the viewport are,
"trivially rejected".

When guardband clipping is turned on the above behavior is changed such
that if the geometry is within the guardband, and intersects the
viewport, it skips the 3d clipper. Prior to GEN8, this was problematic
if the viewport was smaller than the screen as it could allow for
rendering to occur outside of the viewport. That could be mitigated if
the programmer specified a scissor region which was less than or equal
to the viewport - but this is not required for correctness in OpenGL. In
theory you could be clever with the guardband so as not to invoke this
problem. We do not do this, and have no data that suggests we should
bother (nor the converse data).

With viewport extents in place on GEN8, it should be safe to turn on
guardband clipping for all cases

While here, add a comment to the code which confused me thoroughly.

v2: Update grammar in commit message. Reword comments based on Ken's
suggestion.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
10 years agoi965: Simplify viewport extents programming on GEN8
Ben Widawsky [Thu, 3 Jul 2014 00:07:34 +0000 (17:07 -0700)]
i965: Simplify viewport extents programming on GEN8

Viewport extents are a 3rd rectangle that defines which pixels get
discarded as part of the rasterization process. The actual pixels drawn
to the screen are an intersection of the drawing rectangle, the viewport
extents, and the scissor rectangle. It permits the use of guardband
clipping in all cases (see later patch). The actual pixels drawn to the
screen are an intersection of the drawing rectangle, the viewport
extents, and the scissor rectangle.

Scissor rectangle is not super important for this discussion as it should
always help do the right thing provided the programmer uses it.

switch (viewport dimensions, drawrect dimension) {
   case viewport > drawing rectangle: no effects; break;
   case viewport == drawing rectangle: no effects; break;
   case viewport < drawing rectangle:
      Pixels (after the viewport transformation but before expensive
      rastersizing and shading operations) which are outside of the
      viewport are discarded.
}

I am unable to find a test case where this improves performance, but in
all my testing it doesn't hurt performance, and intuitively, it should
not ever hurt performance. It also permits us to use the guardband more
freely (see upcoming patch).

v2: Updating commit message.

v3: Commit message updates requested by Ken

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
10 years agoi965/guardband: Improve comments for guardband clipping
Ben Widawsky [Sat, 2 Aug 2014 03:28:07 +0000 (20:28 -0700)]
i965/guardband: Improve comments for guardband clipping

While working in this part of the code I had a great deal of trouble
understanding what it was trying to do, and matching it with the spec.
(mostly due bad wording in the PRM). To help future people, I've cleaned
up the wording and provided some ascii art.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
10 years agoi965: Support the allow_glsl_extension_directive_midshader option.
Kenneth Graunke [Fri, 8 Aug 2014 08:03:15 +0000 (01:03 -0700)]
i965: Support the allow_glsl_extension_directive_midshader option.

This adds support for Marek's new driconf parameter, which avoids
totally white rendering in Unigine Valley (which attempts to enable
the GL_ARB_sample_shading extension in an illegal place).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75664
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
10 years agoi965/fs: set virtual_grf_count in assign_regs()
Connor Abbott [Fri, 8 Aug 2014 23:25:34 +0000 (16:25 -0700)]
i965/fs: set virtual_grf_count in assign_regs()

This lets us call dump_instructions() after register allocation without
failing an assertion.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
10 years agoi965/fs: don't read from uninitialized memory while assigning registers
Connor Abbott [Fri, 8 Aug 2014 21:57:27 +0000 (14:57 -0700)]
i965/fs: don't read from uninitialized memory while assigning registers

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
10 years agoi965/fs: Fix bad whitespace.
Matt Turner [Fri, 8 Aug 2014 18:58:05 +0000 (11:58 -0700)]
i965/fs: Fix bad whitespace.

10 years agogallium/radeon: Set gpu_address to 0 if r600_virtual_address is false
Niels Ole Salscheider [Sun, 10 Aug 2014 10:52:12 +0000 (12:52 +0200)]
gallium/radeon: Set gpu_address to 0 if r600_virtual_address is false

Without this patch I get the following during DMA transfers:
[drm:radeon_cs_ib_chunk] *ERROR* Invalid command stream !
radeon 0000:01:00.0: CP DMA dst buffer too small (21475829792 4096)

This is a fixup for e878e154cdfd4dbb5474f776e0a6d86fcb983098.

Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
10 years agoradeonsi: simplify constant buffer upload for big endian
Marek Olšák [Sat, 9 Aug 2014 20:26:46 +0000 (22:26 +0200)]
radeonsi: simplify constant buffer upload for big endian

Point util_memcpy_cpu_to_le32 to a buffer storage directly.

v2: simplify more

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agowinsys/radeon: fix compile warnings
Marek Olšák [Sat, 9 Aug 2014 21:48:41 +0000 (23:48 +0200)]
winsys/radeon: fix compile warnings

10 years agor600g/compute: fix compile warnings
Marek Olšák [Sat, 9 Aug 2014 20:24:03 +0000 (22:24 +0200)]
r600g/compute: fix compile warnings

Trivial.

10 years agor300g: handle new shader caps
Marek Olšák [Sat, 9 Aug 2014 20:23:23 +0000 (22:23 +0200)]
r300g: handle new shader caps

Trivial.

10 years agoradeonsi: fix CMASK and HTILE allocation on Tahiti
Marek Olšák [Thu, 7 Aug 2014 19:14:31 +0000 (21:14 +0200)]
radeonsi: fix CMASK and HTILE allocation on Tahiti

Tahiti has 12 tile pipes, but P8 pipe config.

It looks like there is no way to get the pipe config except for reading
GB_TILE_MODE. The TILING_CONFIG ioctl doesn't return more than 8 pipes,
so we can't use that for Hawaii.

This fixes a regression caused by 9b046474c95f15338d4c748df9b62871bba6f36f
on Tahiti.

v2: add an assertion and print an error on failure

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agogallium/radeon: remove r600_resource_va
Marek Olšák [Wed, 6 Aug 2014 20:58:18 +0000 (22:58 +0200)]
gallium/radeon: remove r600_resource_va

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
10 years agogallium/radeon: use gpu_address from r600_resource
Marek Olšák [Wed, 6 Aug 2014 20:29:27 +0000 (22:29 +0200)]
gallium/radeon: use gpu_address from r600_resource

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
10 years agor600g: use gpu_address from r600_resource
Marek Olšák [Wed, 6 Aug 2014 20:29:27 +0000 (22:29 +0200)]
r600g: use gpu_address from r600_resource

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
10 years agoradeonsi: use gpu_address from r600_resource
Marek Olšák [Wed, 6 Aug 2014 20:29:27 +0000 (22:29 +0200)]
radeonsi: use gpu_address from r600_resource

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
10 years agogallium/radeon: store VM address in r600_resource
Marek Olšák [Wed, 6 Aug 2014 20:27:43 +0000 (22:27 +0200)]
gallium/radeon: store VM address in r600_resource

This will help to get rid of the buffer_get_virtual_address calls.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
10 years agor600g: remove useless r600_resource_va calls
Marek Olšák [Wed, 6 Aug 2014 19:45:41 +0000 (21:45 +0200)]
r600g: remove useless r600_resource_va calls

R600-R700 don't support virtual memory.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
10 years agoradeonsi: always prefer SWITCH_ON_EOP(0) on CIK
Marek Olšák [Wed, 6 Aug 2014 01:18:06 +0000 (03:18 +0200)]
radeonsi: always prefer SWITCH_ON_EOP(0) on CIK

The code is rewritten to take known constraints into account, while always
using 0 by default.

This should improve performance for multi-SE parts in theory.

A debug option is also added for easier debugging. (If there are hangs,
use the option. If the hangs go away, you have found the problem.)

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
v2: fix a typo, set max_se for evergreen GPUs according to the kernel driver

10 years agoradeonsi: fix a hang with instancing in Unigine Heaven/Valley on Hawaii
Marek Olšák [Wed, 6 Aug 2014 00:11:04 +0000 (02:11 +0200)]
radeonsi: fix a hang with instancing in Unigine Heaven/Valley on Hawaii

This isn't documented anywhere, but it's the only thing that works
for this case.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
10 years agoradeon,r200: fix buffer validation after CS flush
Marek Olšák [Fri, 1 Aug 2014 17:36:37 +0000 (19:36 +0200)]
radeon,r200: fix buffer validation after CS flush

This validates all bound buffers (CB, ZB, textures, DMA) at the beginning
of CS. This fixes "bo->space_accouned" assertion failures.

Tested by: Jochen Rollwagen <joro-2013@t-online.de>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
10 years agost/mesa: fix blit-based partial TexSubImage for 1D arrays
Marek Olšák [Thu, 7 Aug 2014 22:34:31 +0000 (00:34 +0200)]
st/mesa: fix blit-based partial TexSubImage for 1D arrays

This fixes piglit spec/EXT_texture_array/render-1darray.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
10 years agost/mesa: fix DrawPixels(GL_STENCIL_INDEX)
Marek Olšák [Thu, 7 Aug 2014 18:58:53 +0000 (20:58 +0200)]
st/mesa: fix DrawPixels(GL_STENCIL_INDEX)

This is a bug which was probably uncovered recently by Jason's commits
and broke this.

The problem is _mesa_base_tex_format(GL_STENCIL_INDEX) returns -1.

Tested-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agost/mesa: dump TGSI before calling into the driver
Marek Olšák [Wed, 6 Aug 2014 11:20:41 +0000 (13:20 +0200)]
st/mesa: dump TGSI before calling into the driver

If the driver crashes in create_xx_shader, you want to see the shader.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
10 years agoconfigure.ac: Use LIBS rather than LDFLAGS to add -ldl to dladdr check
Jon TURNEY [Fri, 8 Aug 2014 19:13:18 +0000 (20:13 +0100)]
configure.ac: Use LIBS rather than LDFLAGS to add -ldl to dladdr check

ec8ebff "Check for dladdr()" erroneously uses LDFLAGS rather than LIBS to add
-ldl to the dladdr check.

Replace the workaround in 39a4cc4 of explicitly checking in libdl, with a more
correct approach of using LIBS.

Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Pali Rohár <pali.rohar@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
10 years agovc4: Add support for the COS instruction.
Eric Anholt [Thu, 7 Aug 2014 00:25:31 +0000 (17:25 -0700)]
vc4: Add support for the COS instruction.

10 years agovc4: Add support for the SIN instruction.
Eric Anholt [Wed, 6 Aug 2014 23:41:12 +0000 (16:41 -0700)]
vc4: Add support for the SIN instruction.

v2: Rebase on helpers.

10 years agovc4: Fix register aliasing for packing of scaled coordinates.
Eric Anholt [Wed, 6 Aug 2014 19:34:00 +0000 (12:34 -0700)]
vc4: Fix register aliasing for packing of scaled coordinates.

Fixes glean fragProg1's "ADD test" and likely many others.

10 years agovc4: Add some debug code for forcing fragment shader output color.
Eric Anholt [Wed, 6 Aug 2014 18:52:57 +0000 (11:52 -0700)]
vc4: Add some debug code for forcing fragment shader output color.

10 years agou_primconvert: Copy min/max_index from the original primitive.
Eric Anholt [Tue, 5 Aug 2014 18:29:07 +0000 (11:29 -0700)]
u_primconvert: Copy min/max_index from the original primitive.

These values are supposed to be the minimum/maximum index values used to
read from the vertex buffers.  This code either copies index values out of
the old IB (so, same min/max as the original draw call), or generates a
new IB (using index values between the start and the start + count of the
old array draw info, which just happens to be what min/max_index are set
to by st_draw.c).

We were incorrectly setting the max_index in the
converting-from-glDrawArrays case to the start vertex plus the number of
vertices generated in the new IB, which broke QUADS primitive conversion
on VC4 (where max_index really has to be correct, or the kernel might
reject your draw call due to buffer overflow).

Reviewed-by: Rob Clark <robclark@freedesktop.org> (from verbal description
             of the patch)

10 years agovc4: Fix using and emitting the 1/W from the vertex/coord shaders.
Eric Anholt [Fri, 1 Aug 2014 22:45:41 +0000 (15:45 -0700)]
vc4: Fix using and emitting the 1/W from the vertex/coord shaders.

v2: Rebase on helpers change.

10 years agovc4: Add support for swizzles of 32 bit float vertex attributes.
Eric Anholt [Fri, 1 Aug 2014 19:50:53 +0000 (12:50 -0700)]
vc4: Add support for swizzles of 32 bit float vertex attributes.

Some tests start working (useprogram-flushverts, for example) due to
getitng the right vertices now.  Some that used to pass start failing with
memory overflow during binning, which is weird (glsl-fs-texture2drect).
And a couple stop rendering correctly (glsl-fs-bug25902).

v2: Move the attribute format setup in the key from after search time to
    before the search.
v3: Fix reading of attributes other than position (I forgot to respect
    attr and stored everything in inputs 0-3, i.e. position).

10 years agovc4: Add support for the TGSI FRC opcode.
Eric Anholt [Tue, 5 Aug 2014 20:35:19 +0000 (13:35 -0700)]
vc4: Add support for the TGSI FRC opcode.

v2: Rebase on helpers.

10 years agovc4: Add support for the TGSI TRUNC opcode.
Eric Anholt [Tue, 5 Aug 2014 20:33:50 +0000 (13:33 -0700)]
vc4: Add support for the TGSI TRUNC opcode.

v2: Rebase on helpers.

10 years agovc4: Crank up the tile allocation BO size
Eric Anholt [Thu, 17 Jul 2014 05:45:41 +0000 (22:45 -0700)]
vc4: Crank up the tile allocation BO size

This avoids a simulator assertion failure with glamor.  I need to actually
support resize, though.

10 years agovc4: Add support for multiple attributes
Eric Anholt [Thu, 17 Jul 2014 05:11:08 +0000 (22:11 -0700)]
vc4: Add support for multiple attributes

10 years agovc4: Add more useful debug for the undefined-source case
Eric Anholt [Wed, 16 Jul 2014 16:09:05 +0000 (09:09 -0700)]
vc4: Add more useful debug for the undefined-source case

We could get undefined sources in real programs from the wild, so we'll
need to turn off this debug eventually.  But for now, using undefined
sources is typically me just mistyping something.

10 years agovc4: Add support for the lit opcode.
Eric Anholt [Wed, 16 Jul 2014 16:08:48 +0000 (09:08 -0700)]
vc4: Add support for the lit opcode.

v2: Fix how it was using the X channel for the real work of the opcode,
    instead of Y.  Fixes glean's LIT test.
v3: Rebase on the helpers.

10 years agovc4: Add support for the POW opcode
Eric Anholt [Wed, 16 Jul 2014 15:44:50 +0000 (08:44 -0700)]
vc4: Add support for the POW opcode

v2: Rebase on helpers.

10 years agovc4: Refactor uniform handling.
Eric Anholt [Tue, 15 Jul 2014 18:46:20 +0000 (11:46 -0700)]
vc4: Refactor uniform handling.

I wanted an easy way to set up new uniforms every time, so I could handle
texture-sampler-related uniforms.

v2: Rebase on helpers change.

10 years agovc4: Add support for the LRP opcode.
Eric Anholt [Tue, 15 Jul 2014 18:04:41 +0000 (11:04 -0700)]
vc4: Add support for the LRP opcode.

v2: Rebase on helpers, cutting out most of the code in this change.

10 years agovc4: Add copy propagation between temps.
Eric Anholt [Fri, 4 Jul 2014 17:59:42 +0000 (10:59 -0700)]
vc4: Add copy propagation between temps.

We put in a bunch of extra MOVs for program outputs, and this can clean
those up.  We should do uniforms, too, though.

v2: Fix missing flagging of progress when we actually optimize.  Caught by
    Aaron Watry.

10 years agovc4: Add dead code elimination.
Eric Anholt [Fri, 4 Jul 2014 16:48:23 +0000 (09:48 -0700)]
vc4: Add dead code elimination.

This cleans up a bunch of noise in the compiled coordinate shaders (since
we don't need the varying outputs), and also from writemasked instructions
with negated src operands.

10 years agovc4: Add an initial pass of algebraic optimization.
Eric Anholt [Thu, 3 Jul 2014 20:18:49 +0000 (13:18 -0700)]
vc4: Add an initial pass of algebraic optimization.

There was a lot of extra noise in my piglit shader dumps because of silly
CMPs.

10 years agovc4: Add support for CMP.
Eric Anholt [Wed, 16 Jul 2014 15:12:27 +0000 (08:12 -0700)]
vc4: Add support for CMP.

This took a couple of tries, and this is the squash of those attempts.

v2: Fix register file conflicts on the args in the
    destination-is-accumulator case.
v3: Rebase on helper change and qir_inst4 change.

10 years agovc4: Make scheduling of NOPs a separate step from QIR -> QPU translation.
Eric Anholt [Fri, 4 Jul 2014 18:51:31 +0000 (11:51 -0700)]
vc4: Make scheduling of NOPs a separate step from QIR -> QPU translation.

This should also be used as a way to pair QIR instructions into QPU
instructions later.

10 years agovc4: Add WIP support for varyings.
Eric Anholt [Fri, 4 Jul 2014 16:38:44 +0000 (09:38 -0700)]
vc4: Add WIP support for varyings.

It doesn't do all the interpolation yet, but more tests can run now.

v2: Rebase on helpers.

10 years agovc4: Use r3 instead of r5 for temps, since r5 only has 32 bits of storage
Eric Anholt [Fri, 4 Jul 2014 17:23:50 +0000 (10:23 -0700)]
vc4: Use r3 instead of r5 for temps, since r5 only has 32 bits of storage

Reserving a whole accumulator for temps is awful in the first place, but
I'll fix that later.

10 years agovc4: Fix emit of ABS
Eric Anholt [Wed, 2 Jul 2014 17:43:50 +0000 (10:43 -0700)]
vc4: Fix emit of ABS

v2: Rebase on qir helpers.

10 years agovc4: Add shader variant caching to handle FS output swizzle.
Eric Anholt [Tue, 1 Jul 2014 21:42:42 +0000 (14:42 -0700)]
vc4: Add shader variant caching to handle FS output swizzle.

10 years agovc4: Load the tile buffer before incrementally drawing.
Eric Anholt [Tue, 1 Jul 2014 17:10:37 +0000 (10:10 -0700)]
vc4: Load the tile buffer before incrementally drawing.

We will want to occasionally disable this again when we do clear support.

v2: Squash with the previous commit (I accidentally committed at two
    stages of writing the change)

10 years agovc4: Don't reallocate the tile alloc/state bos every frame.
Eric Anholt [Sat, 28 Jun 2014 21:59:18 +0000 (14:59 -0700)]
vc4: Don't reallocate the tile alloc/state bos every frame.

This was a problem for the simulator since we don't free memory back to
it, and it would soon just run out.

10 years agovc4: Add VC4_DEBUG env option
Eric Anholt [Sat, 28 Jun 2014 21:36:26 +0000 (14:36 -0700)]
vc4: Add VC4_DEBUG env option

v2: Fix an accidental deletion of some characters from the copyright
    message (caught by Ilia Mirkin)

10 years agovc4: Add support for SNE/SEQ/SGE/SLT.
Eric Anholt [Sat, 28 Jun 2014 16:26:15 +0000 (17:26 +0100)]
vc4: Add support for SNE/SEQ/SGE/SLT.

10 years agovc4: Use the user's actual first vertex attribute.
Eric Anholt [Fri, 27 Jun 2014 15:32:03 +0000 (16:32 +0100)]
vc4: Use the user's actual first vertex attribute.

This is hardcoded to read it as RGBA32F so far, but starts to get more
tests working.

10 years agovc4: Fix UBO allocation when no uniforms are used.
Eric Anholt [Fri, 18 Jul 2014 23:29:18 +0000 (16:29 -0700)]
vc4: Fix UBO allocation when no uniforms are used.

We do rely on a real BO getting allocated, so make sure we ask for a non-zero size.

10 years agovc4: Add initial support for math opcodes
Eric Anholt [Wed, 16 Jul 2014 15:25:22 +0000 (08:25 -0700)]
vc4: Add initial support for math opcodes

10 years agovc4: Switch to actually generating vertex and fragment shader code from TGSI.
Eric Anholt [Thu, 26 Jun 2014 22:07:39 +0000 (23:07 +0100)]
vc4: Switch to actually generating vertex and fragment shader code from TGSI.

This introduces an IR (QIR, for QPU IR) to do optimization on.  It's a
scalar, SSA IR in general.  It looks like optimization is pretty easy this
way, though I haven't figured out if it's going to be good for our weird
register allocation or not (or if I want to reduce to basically QPU
instructions first), and I've got some problems with it having some
multi-QPU-instruction opcodes (SEQ and CMP, for example) which I probably
want to break down.

Of course, this commit mostly doesn't work, since many other things are
still hardwired, like the VBO data.

v2: Rewrite to use a bunch of helpers (qir_OPCODE) for emitting QIR
    instructions into temporary values, and make qir_inst4 take the 4 args
    separately instead of an array (all later callers wanted individual
    args).

10 years agovc4: Start converting the driver to use vertex shaders.
Eric Anholt [Tue, 24 Jun 2014 15:39:08 +0000 (16:39 +0100)]
vc4: Start converting the driver to use vertex shaders.

Note: This is the cutoff point where I switched from developing primarily
on the Pi to developing o the simulator.  As a result, from this point on
the code is untested on the Pi (the kernel code I have currently wasn't
rendering anything at this commit, though the simulator renders
successfully, suggesting kernel bugs).

10 years agovc4: Initial skeleton driver import.
Eric Anholt [Thu, 19 Jun 2014 07:19:38 +0000 (08:19 +0100)]
vc4: Initial skeleton driver import.

This mostly just takes every draw call and turns it into a sequence of
commands that clear the FBO and draw a single shaded triangle to it,
regardless of the actual input vertices or shaders.  I copied the initial
driver skeleton mostly from freedreno, and I've preserved Rob Clark's
copyright for those.  I also based my initial hardcoded shaders and
command lists on Scott Mansell (phire)'s "hackdriver" project, though the
bit patterns of the shaders emitted end up being different.

v2: Rebase on gallium megadrivers changes.
v3: Rebase on PIPE_SHADER_CAP_MAX_CONSTS change.
v4: Rely on simpenrose actually being installed when building for
    simulation.
v5: Add more header duplicate-include guards.
v6: Apply Emil's review (protection against vc4 sim and ilo at the same
    time, and dropping the dricommon drm bits) and fix a copyright header
    (thanks, Roland)

10 years agodraw: (trivial) use information about gs being present from variant key
Roland Scheidegger [Fri, 8 Aug 2014 16:17:18 +0000 (18:17 +0200)]
draw: (trivial) use information about gs being present from variant key

This is a purely cosmetic change.

Reviewed-by: Brian Paul <brianp@vmware.com>
10 years agodraw: don't use clipvertex output if user plane clipping is disabled
Roland Scheidegger [Sat, 9 Aug 2014 01:51:23 +0000 (03:51 +0200)]
draw: don't use clipvertex output if user plane clipping is disabled

The non-llvm path made sure that both clip and pre_clip_pos point to the data
output by position, not clipvertex, if user based clipping is disabled.
However, the llvm path did not, which apparently led to failures if
gl_ClipVertex was written but user plane clipping not enabled (bug 80183).
Why I have no idea really, but just make it match the non-llvm behavior...

Reviewed-by: Brian Paul <brianp@vmware.com>
10 years agoi965: Get rid of backend_instruction::sampler
Chris Forbes [Sun, 3 Aug 2014 09:40:00 +0000 (21:40 +1200)]
i965: Get rid of backend_instruction::sampler

The generators no longer use this.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
10 years agoi965/vec4/Gen8: Use src1 for sampler_index instead of ->sampler field
Chris Forbes [Mon, 4 Aug 2014 07:41:03 +0000 (19:41 +1200)]
i965/vec4/Gen8: Use src1 for sampler_index instead of ->sampler field

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
10 years agoi965/vec4/Gen4-7: Use src1 for sampler_index instead of ->sampler field
Chris Forbes [Mon, 4 Aug 2014 07:41:03 +0000 (19:41 +1200)]
i965/vec4/Gen4-7: Use src1 for sampler_index instead of ->sampler field

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
10 years agoi965/vec4: Pass sampler index in src1 for texture ops
Chris Forbes [Mon, 4 Aug 2014 07:37:58 +0000 (19:37 +1200)]
i965/vec4: Pass sampler index in src1 for texture ops

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
10 years agoi965/vec4: Collect all emits of texture ops into one place
Chris Forbes [Sun, 3 Aug 2014 10:01:11 +0000 (22:01 +1200)]
i965/vec4: Collect all emits of texture ops into one place

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
10 years agoi965/fs/Gen8: Pass sampler_index to generate_tex
Chris Forbes [Sun, 3 Aug 2014 09:23:31 +0000 (21:23 +1200)]
i965/fs/Gen8: Pass sampler_index to generate_tex

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
10 years agoi965/fs/Gen4-7: Pass sampler_index to generate_tex
Chris Forbes [Sun, 3 Aug 2014 09:23:31 +0000 (21:23 +1200)]
i965/fs/Gen4-7: Pass sampler_index to generate_tex

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
10 years agoi965/blorp: Put sampler index in src1 of texture ops
Chris Forbes [Sun, 3 Aug 2014 09:39:13 +0000 (21:39 +1200)]
i965/blorp: Put sampler index in src1 of texture ops

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>