mesa.git
8 years agoi965/blorp: Pipeline upload support for gen8
Topi Pohjolainen [Tue, 29 Mar 2016 07:50:42 +0000 (10:50 +0300)]
i965/blorp: Pipeline upload support for gen8

v2 (Ken): Drop GEN8_RASTER_FRONT_WINDING_CCW in raster state
          Add emission of pma stall.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/gen8: Expose pma stall emission
Topi Pohjolainen [Thu, 21 Apr 2016 07:12:46 +0000 (10:12 +0300)]
i965/gen8: Expose pma stall emission

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Allow texture surface state setup to be used by blorp
Topi Pohjolainen [Thu, 7 Apr 2016 10:09:52 +0000 (13:09 +0300)]
i965: Allow texture surface state setup to be used by blorp

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Prepare sampling for gen9
Topi Pohjolainen [Fri, 1 Apr 2016 08:21:03 +0000 (11:21 +0300)]
i965/blorp: Prepare sampling for gen9

v2 (Ken): Added switch cases for gen8/9 in texel_fetch(). These
          were wrongly introduced in blit-enabling patch.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Prepare render target write for gen8
Topi Pohjolainen [Wed, 30 Mar 2016 17:50:41 +0000 (20:50 +0300)]
i965/blorp: Prepare render target write for gen8

v2 (Ken): Use payload directly instead of retyping it into vec8.
          Drop the implied header, it isn't used for gen6+ anyway.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp/gen6: Prepare vertex buffer setup logic for gen8
Topi Pohjolainen [Fri, 6 Mar 2015 12:21:25 +0000 (14:21 +0200)]
i965/blorp/gen6: Prepare vertex buffer setup logic for gen8

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp/gen7: Expose state setup applicable to gen8
Topi Pohjolainen [Sun, 1 Mar 2015 20:38:59 +0000 (22:38 +0200)]
i965/blorp/gen7: Expose state setup applicable to gen8

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Use 8k chunk size for urb allocation
Topi Pohjolainen [Fri, 15 Apr 2016 07:12:20 +0000 (10:12 +0300)]
i965/blorp: Use 8k chunk size for urb allocation

Previously, we hardcoded "VS URB Starting Address" to 2 (in 8kB chunks),
which meant VS URB data would start at an offset of 16kB.

However, on Haswell GT3 and Gen8+, we allocate the first 32kB for the
push constant region.  This means that the PS push constant and VS URB
data regions overlap, which can lead to corruption.

v2 (Ken): Better description of the change, and do not change vs_size
          from 2 to 1.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp/gen7: Prepare re-using for gen8
Topi Pohjolainen [Fri, 6 Mar 2015 13:55:02 +0000 (15:55 +0200)]
i965/blorp/gen7: Prepare re-using for gen8

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/blorp: Let compiler calculate the vertex buffer size
Topi Pohjolainen [Tue, 12 Apr 2016 06:27:00 +0000 (09:27 +0300)]
i965/blorp: Let compiler calculate the vertex buffer size

Currently the size is sizeof(float) times too large. One reserves
GEN6_BLORP_VBO_SIZE many floats whereas GEN6_BLORP_VBO_SIZE stands
for the size of vertex buffer in bytes.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/gen8: Expose state base address setup
Topi Pohjolainen [Mon, 2 Mar 2015 09:29:05 +0000 (11:29 +0200)]
i965/gen8: Expose state base address setup

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/gen8: Expose surface state helpers
Topi Pohjolainen [Tue, 29 Mar 2016 08:36:23 +0000 (11:36 +0300)]
i965/gen8: Expose surface state helpers

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/gen9: Use correct size for DS_STATE
Topi Pohjolainen [Thu, 31 Mar 2016 07:19:24 +0000 (10:19 +0300)]
i965/gen9: Use correct size for DS_STATE

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoglsl: add forgotten textureOffset function for sampler2DArrayShadow
Roland Scheidegger [Tue, 19 Apr 2016 00:21:35 +0000 (02:21 +0200)]
glsl: add forgotten textureOffset function for sampler2DArrayShadow

This was part of EXT_gpu_shader4 - as such it should have been supported
by glsl 130.
It was however forgotten, and not added until glsl 430 - with the wrong
syntax no less (glsl 430 mentions it was overlooked).
glsl 440 (but revision 8 only) fixed this finally for good.
At least nvidia supports this with just version glsl version 1.30 as well
(the spec doesn't explicitly say it should be supported retroactively),
so just add this to the other glsl 130 textureOffset functions.

Passes a (hacked) piglit tex-miplevel-selection test (2DArrayShadow
textureOffset -auto) with llvmpipe.

v2: fix up comment (by Ian), add testing to commit message.

Reviewed-by: Dave Airlie <airlied@gmail.com>
8 years agoi965: Fix interpolateAtSample() on single sampled buffers.
Kenneth Graunke [Wed, 6 Apr 2016 05:32:45 +0000 (22:32 -0700)]
i965: Fix interpolateAtSample() on single sampled buffers.

Fixes dEQP-GLES31.functional.shaders.multisample_interpolation tests:
- interpolate_at_sample.non_multisample_buffer.sample_n_default_framebuffer
- interpolate_at_sample.non_multisample_buffer.sample_n_singlesample_rbo
- interpolate_at_sample.non_multisample_buffer.sample_n_singlesample_texture

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965: Fix gl_SampleMaskIn[] in per-sample shading mode.
Kenneth Graunke [Wed, 6 Apr 2016 03:14:22 +0000 (20:14 -0700)]
i965: Fix gl_SampleMaskIn[] in per-sample shading mode.

The coverage mask is not sufficient - in per-sample mode, we also need
to AND with a mask representing the samples being processed by the
current fragment shader invocation.

Fixes 18 dEQP-GLES31.functional.shaders.sample_variables tests:

sample_mask_in.bit_count_per_sample.multisample_{rbo,texture}_{1,2,4,8}
sample_mask_in.bit_count_per_two_samples.multisample_{rbo,texture}_{4,8}
sample_mask_in.bits_unique_per_sample.multisample_{rbo,texture}_{1,2,4,8}
sample_mask_in.bits_unique_per_two_samples.multisample_{rbo,texture}_{4,8}

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965: Only enable oMask output when there's a multisample FBO.
Kenneth Graunke [Tue, 5 Apr 2016 09:09:08 +0000 (02:09 -0700)]
i965: Only enable oMask output when there's a multisample FBO.

The ARB_sample_shading specification says that setting gl_SampleMask
bits to 0 means that the corresponding sample "should be considered
uncovered for the purposes of multisample fragment operations
(Section 4.1.3)."

The OpenGL 4.4 specification, section 17.3.3 ("Multisample Fragment
Operations") specifies:

"No changes to the fragment alpha or coverage values are made at this
 step if MULTISAMPLE is disabled, or if the value of SAMPLE_BUFFERS
 is not one."

oMask output alters coverage masks and can kill pixels.  We need to
disable it in the above case, which conveniently corresponds to
key->multisample_fbo being false.

Khronos bug #12188 also spells this out clearly:
https://cvs.khronos.org/bugzilla/show_bug.cgi?id=12188

Fixes two Piglit tests:
tests/spec/arb_sample_shading/builtin-gl-sample-mask-simple 0
tests/spec/arb_sample_shading/builtin-gl-sample-mask 0

Fixes 21 ES3 conformance tests:
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_0
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_1
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_2
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_3
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_7
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_3
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_4
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_5
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_7
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_2
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_3
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_4
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_6
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_0
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_2
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_5
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_7

Fixes 9 dEQP-GLES31.functional.shaders.sample_variables tests:
sample_mask.discard_half_per_pixel.default_framebuffer
sample_mask.discard_half_per_pixel.singlesample_rbo
sample_mask.discard_half_per_pixel.singlesample_texture
sample_mask.discard_half_per_sample.default_framebuffer
sample_mask.discard_half_per_sample.singlesample_rbo
sample_mask.discard_half_per_sample.singlesample_texture
sample_mask.discard_half_per_two_samples.default_framebuffer
sample_mask.discard_half_per_two_samples.singlesample_rbo
sample_mask.discard_half_per_two_samples.singlesample_texture

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965: Generalize wm_key->compute_sample_id to wm_key->multisample_fbo.
Kenneth Graunke [Wed, 6 Apr 2016 02:35:46 +0000 (19:35 -0700)]
i965: Generalize wm_key->compute_sample_id to wm_key->multisample_fbo.

I'm going to need a key entry meaning "we have a multisample FBO,
and multisampling is enabled" in an upcoming patch.  This is basically
wm_key->compute_sample_id, except that it also checks that the SAMPLE_ID
system value is read.

The only use of wm_key->compute_sample_id is in emit_sampleid_setup(),
which is only called when handling the SAMPLE_ID system value.  So we
can just eliminate the check and generalize the field.

v2: Also update the Vulkan driver.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965: Delete now dead persample_2x FS program key flag.
Kenneth Graunke [Wed, 6 Apr 2016 02:33:04 +0000 (19:33 -0700)]
i965: Delete now dead persample_2x FS program key flag.

This was only used by the old gl_SampleID calculations.  The new code
doesn't need to handle 2x specially.

v2: Delete it from the Vulkan driver, too.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965: Simplify gl_SampleID setup on Gen8+.
Kenneth Graunke [Wed, 6 Apr 2016 02:29:36 +0000 (19:29 -0700)]
i965: Simplify gl_SampleID setup on Gen8+.

On Gen7+, the thread payload provides the sample ID - we can read it
in two instructions, without any elaborate calculations.  We don't even
need a state dependency - this will properly produce zero in the
non-MSAA case.  Unfortunately, we need the state flag anyway, so we
may as well continue to use it to produce a single MOV 0 instead of
SHR/AND.

For some reason, the sample ID field is always zero on Gen7/7.5, so
we can't use this yet.  However, it works fine on Gen8+.  So, land the
code and use it where it's working, and leave a TODO for later.

v2: Fix register types in the comment (caught by Matt Turner!).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965: Flip key->compute_sample_id check.
Kenneth Graunke [Tue, 19 Apr 2016 01:11:01 +0000 (18:11 -0700)]
i965: Flip key->compute_sample_id check.

This just moves the simple case first.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agost/mesa: Use correct size for compute CAPs.
Bas Nieuwenhuizen [Wed, 20 Apr 2016 13:31:22 +0000 (15:31 +0200)]
st/mesa: Use correct size for compute CAPs.

Some CAPs are stored as 64-bit value while Mesa stores
the related constant as 32-bit value.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
8 years agoi965: Properly handle integer types in opt_vector_float().
Kenneth Graunke [Wed, 13 Apr 2016 23:58:10 +0000 (16:58 -0700)]
i965: Properly handle integer types in opt_vector_float().

Previously, opt_vector_float() always interpreted MOV sources as
floating point, and always created a MOV with a F-type destination.

This meant that we could mess up sequences of integer loads, such as:

   mov vgrf6.0.x:D, 0D
   mov vgrf6.0.y:D, 1D
   mov vgrf6.0.z:D, 2D
   mov vgrf6.0.w:D, 3D

Here, integer 0/1/2/3 become approximately 0.0f, so we generated:

   mov vgrf6.0:F, [0F, 0F, 0F, 0F]

which is clearly wrong.  We can properly handle this by converting
integer values to float (rather than bitcasting), and emitting a type
converting MOV:

   mov vgrf6.0:D, [0F, 1F, 2F, 3F]

To do this, see first see if the integer values (converted to float)
are representable.  If so, we use a D-type MOV.  If not, we then try
the floating point values and an F-type MOV.  We make zero not impose
type restrictions.  This is important because 0D would imply a D-type
MOV, but is often used in sequences such as MOV 0D, MOV 0x3f800000D,
where we want to use an F-type MOV.

Fixes about 54 dEQP-GLES2 failures with the vec4 VS backend.  This
recently became visible due to changes in opt_vector_float() which
made it optimize more cases, but it was a pre-existing bug.

Apparently it also manages to turn more integer loads into VFs,
producing the following shader-db statistics on Haswell:

total instructions in shared programs: 7084195 -> 7082191 (-0.03%)
instructions in affected programs: 246027 -> 244023 (-0.81%)
helped: 1937

total cycles in shared programs: 65669642 -> 65651968 (-0.03%)
cycles in affected programs: 531064 -> 513390 (-3.33%)
helped: 1177

v2: Handle the type of zero better.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965: Make opt_vector_float() only handle non-type-conversion MOVs.
Kenneth Graunke [Wed, 13 Apr 2016 23:39:54 +0000 (16:39 -0700)]
i965: Make opt_vector_float() only handle non-type-conversion MOVs.

We don't handle this properly - we'd have to perform the type conversion
before trying to convert the value to a VF.

While we could do that, it doesn't seem particularly useful - most
vector loads should be consistently typed (all float or all integer).

As a special case, we do allow type-converting MOVs of integer 0, as
it's represented the same regardless of the type.  I believe this case
does actually come up.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965: Fold vectorize_mov() back into the one caller.
Kenneth Graunke [Wed, 13 Apr 2016 23:04:04 +0000 (16:04 -0700)]
i965: Fold vectorize_mov() back into the one caller.

After the previous patch, this helper is only called in one place.
So, just fold it back in - there are a lot of parameters here and
not much code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoi965: Rework opt_vector_float() control flow.
Kenneth Graunke [Wed, 13 Apr 2016 22:56:07 +0000 (15:56 -0700)]
i965: Rework opt_vector_float() control flow.

This reworks opt_vector_float() so that there's only one place that
flushes out any accumulated state and emits a VF.

v2: Don't break the sequence for non-representable numbers - just skip
    recording their values.  Only break it for non-MOVs or register
    changes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoanv: s/anv_batch_emit_blk/anv_batch_emit/
Jason Ekstrand [Tue, 19 Apr 2016 00:03:00 +0000 (17:03 -0700)]
anv: s/anv_batch_emit_blk/anv_batch_emit/

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv: Remove the old emit macro
Jason Ekstrand [Mon, 18 Apr 2016 23:53:11 +0000 (16:53 -0700)]
anv: Remove the old emit macro

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/gen7_pipeline: Use the new emit macro
Jason Ekstrand [Mon, 18 Apr 2016 23:51:12 +0000 (16:51 -0700)]
anv/gen7_pipeline: Use the new emit macro

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/gen7_cmd_buffer: Use the new emit macro
Jason Ekstrand [Mon, 18 Apr 2016 23:33:46 +0000 (16:33 -0700)]
anv/gen7_cmd_buffer: Use the new emit macro

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/device: Use the new emit macro
Jason Ekstrand [Mon, 18 Apr 2016 22:32:29 +0000 (15:32 -0700)]
anv/device: Use the new emit macro

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/state: Use the new emit macro
Jason Ekstrand [Mon, 18 Apr 2016 22:24:59 +0000 (15:24 -0700)]
anv/state: Use the new emit macro

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/gen8_pipeline: Use the new emit macro
Jason Ekstrand [Mon, 18 Apr 2016 22:53:14 +0000 (15:53 -0700)]
anv/gen8_pipeline: Use the new emit macro

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/genX_pipeline: Use the new emit macro
Jason Ekstrand [Mon, 18 Apr 2016 22:29:42 +0000 (15:29 -0700)]
anv/genX_pipeline: Use the new emit macro

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/gen8_cmd_buffer: Use the new emit macro
Jason Ekstrand [Mon, 18 Apr 2016 23:08:49 +0000 (16:08 -0700)]
anv/gen8_cmd_buffer: Use the new emit macro

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/cmd_buffer: Use the new emit macro for quaries
Jason Ekstrand [Mon, 18 Apr 2016 22:20:06 +0000 (15:20 -0700)]
anv/cmd_buffer: Use the new emit macro for quaries

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/cmd_buffer: Use the new emit macro for DRAWING_RECTANGLE
Jason Ekstrand [Mon, 18 Apr 2016 22:14:47 +0000 (15:14 -0700)]
anv/cmd_buffer: Use the new emit macro for DRAWING_RECTANGLE

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/cmd_buffer: Use the new emit macro for compute shader dispatch
Jason Ekstrand [Mon, 18 Apr 2016 22:11:43 +0000 (15:11 -0700)]
anv/cmd_buffer: Use the new emit macro for compute shader dispatch

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/cmd_buffer: Use the new emit macro for 3DSTATE_CONSTANT
Jason Ekstrand [Mon, 18 Apr 2016 21:55:10 +0000 (14:55 -0700)]
anv/cmd_buffer: Use the new emit macro for 3DSTATE_CONSTANT

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/cmd_buffer: Use the new emit macro for DEPTH/STENCIL_BUFFER
Jason Ekstrand [Mon, 18 Apr 2016 21:48:33 +0000 (14:48 -0700)]
anv/cmd_buffer: Use the new emit macro for DEPTH/STENCIL_BUFFER

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/cmd_buffer: Use the new emit macro for PIPE_CONTROL and STATE_BASE_ADDRESS
Jason Ekstrand [Mon, 18 Apr 2016 21:41:06 +0000 (14:41 -0700)]
anv/cmd_buffer: Use the new emit macro for PIPE_CONTROL and STATE_BASE_ADDRESS

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv/cmd_buffer: Use the new emit macro for 3DPRIMITIVE commands
Jason Ekstrand [Mon, 18 Apr 2016 21:27:29 +0000 (14:27 -0700)]
anv/cmd_buffer: Use the new emit macro for 3DPRIMITIVE commands

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agoanv: Add a new block-based batch emit macro
Jason Ekstrand [Mon, 18 Apr 2016 21:25:03 +0000 (14:25 -0700)]
anv: Add a new block-based batch emit macro

This new macro uses a for loop to create an actual code block in which to
place the macro setup code.  One advantage of this is that you syntatically
use braces instead of parentheses.  Another is that the code in the block
doesn't even get executed if anv_batch_emit_dwords fails.

Acked-by: Kristian Høgsberg <krh@bitplanet.net>
8 years agogk110/ir: make use of IMUL32I for all immediates
Samuel Pitoiset [Wed, 20 Apr 2016 17:06:24 +0000 (19:06 +0200)]
gk110/ir: make use of IMUL32I for all immediates

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
8 years agogk110/ir: do not overwrite def value with zero for EXCH ops
Samuel Pitoiset [Wed, 20 Apr 2016 17:47:37 +0000 (19:47 +0200)]
gk110/ir: do not overwrite def value with zero for EXCH ops

This is only valid for other atomic operations (including CAS). This
fixes an invalid opcode error from dmesg. While we are it, make sure
to initialize global addr to 0 for other atomic operations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
8 years agoanv: fix build without Wayland platform
Marcin Ślusarz [Sat, 16 Apr 2016 20:48:09 +0000 (22:48 +0200)]
anv: fix build without Wayland platform

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoanv: fix building on i686 with -mcpu=generic
Laurent Carlier [Sat, 16 Apr 2016 19:50:39 +0000 (21:50 +0200)]
anv: fix building on i686 with -mcpu=generic

mcpu=generic doesn't enable sse2, and anvil definitly needs it

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agospirv: Trivially handle the NonWriteable decoration
Jason Ekstrand [Wed, 20 Apr 2016 17:32:59 +0000 (10:32 -0700)]
spirv: Trivially handle the NonWriteable decoration

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agonir: rename nir_foreach_block*() to nir_foreach_block*_call()
Connor Abbott [Wed, 13 Apr 2016 20:25:34 +0000 (16:25 -0400)]
nir: rename nir_foreach_block*() to nir_foreach_block*_call()

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agonvc0: avoid tex read fault from compute shaders on GK110
Samuel Pitoiset [Sun, 10 Apr 2016 20:08:34 +0000 (22:08 +0200)]
nvc0: avoid tex read fault from compute shaders on GK110

After some investigation, it seems like that disabling the UNK02C4
command avoid a read fault with texelFetch() from a compute shader.

I have no clue on what this method actually does, but this avoid the
GPU to hang with basic-texelFetch.shader_test without introducing any
compute-related regressions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agoi965/vec4: Always split uniforms in array_access_to_pull_constants
Jason Ekstrand [Tue, 19 Apr 2016 01:57:07 +0000 (18:57 -0700)]
i965/vec4: Always split uniforms in array_access_to_pull_constants

Normally, we split uniforms at the end but in Vulkan, we bail because we
don't want pull constants.  However, we still need them split because
pack_uniforms relies on it.

I really don't like this patch not because it doesn't work (it does) but
because now that we're using MOV_INDIRECT, uniform numbers and sizes don't
really matter anymore.  In the FS backend, uniform splitting and packing is
handled all at once (actual re-assignment of locations happens later) and
we really should do it that way in vec4 eventually as well.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95001

8 years agoi965/vec4: Use the correct offset for the swizzle shift in push constants
Jason Ekstrand [Tue, 19 Apr 2016 01:52:36 +0000 (18:52 -0700)]
i965/vec4: Use the correct offset for the swizzle shift in push constants

This was actually caught by Ken in review the first time around but somehow
didn't get fixed before the patches were pushed. :-(

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95001

8 years agoi965/vec4: Use nir_intrinsic_base in the load_uniform implementation
Jason Ekstrand [Tue, 19 Apr 2016 01:51:50 +0000 (18:51 -0700)]
i965/vec4: Use nir_intrinsic_base in the load_uniform implementation

We shouldn't be reading the const_index directly

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95001

8 years agoanv/apply_dynamic_offsets: Provide a range on the load_uniform
Jason Ekstrand [Tue, 19 Apr 2016 00:55:21 +0000 (17:55 -0700)]
anv/apply_dynamic_offsets: Provide a range on the load_uniform

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95001

8 years agoanv/lower_push_constants: Stop treating scalar specially
Jason Ekstrand [Tue, 19 Apr 2016 00:17:31 +0000 (17:17 -0700)]
anv/lower_push_constants: Stop treating scalar specially

All of the code that did something special based on vec4 vs. scalar is
bogus.  In the backend, everything is now in units of bytes and the vec4
backend can handle full std140 packing so we don't need to do anything
special anymore.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998

8 years agoswr: fix resource backed constant buffers
Tim Rowley [Mon, 18 Apr 2016 23:10:39 +0000 (18:10 -0500)]
swr: fix resource backed constant buffers

Code was using an incorrect address for the base pointer.

v2: use swr_resource_data() utility function.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94979
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tested-by: Markus Wick <markus@selfnet.de>
8 years agonouveau: codegen: Add support for OpenCL global memory buffers
Hans de Goede [Mon, 14 Mar 2016 12:57:07 +0000 (13:57 +0100)]
nouveau: codegen: Add support for OpenCL global memory buffers

Add support for OpenCL global memory buffers, note this has only
been tested with regular load and stores and likely needs more work
for e.g. atomic ops.

Tested with piglet on a gf119 and a gk107:
./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader
[9/9] pass: 9 /
./piglit run -o shader -t '.*arb_compute_shader.*' results/shader
[20/20] skip: 4, pass: 16 |

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
8 years agonouveau: codegen: Use FILE_MEMORY_BUFFER for buffers
Hans de Goede [Thu, 17 Mar 2016 14:50:00 +0000 (15:50 +0100)]
nouveau: codegen: Use FILE_MEMORY_BUFFER for buffers

Some of the lowering steps we currently do for FILE_MEMORY_GLOBAL only
apply to buffers, making it impossible to use FILE_MEMORY_GLOBAL for
OpenCL global buffers.

This commits changes the buffer code to use FILE_MEMORY_BUFFER at the
ir_from_tgsi and lowering steps, freeing use of FILE_MEMORY_GLOBAL
for use with OpenCL global buffers.

Note that after lowering buffer accesses use the FILE_MEMORY_GLOBAL
register file.

Tested with piglet on a gf119 and a gk107:
./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader
[9/9] pass: 9 /
./piglit run -o shader -t '.*arb_compute_shader.*' results/shader
[20/20] skip: 4, pass: 16 |

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
8 years agoscons: Build dri_common_interop.c.
Jose Fonseca [Wed, 20 Apr 2016 11:41:24 +0000 (12:41 +0100)]
scons: Build dri_common_interop.c.

8 years agost/dri: implement the GL interop DRI extension (v2.2)
Marek Olšák [Mon, 29 Feb 2016 22:25:59 +0000 (23:25 +0100)]
st/dri: implement the GL interop DRI extension (v2.2)

v2: - set interop_version
    - simplify the offset_after macro
v2.1: - use version numbers, remove offset_after
      - set "out_driver_data_written"
v2.2: - set buf_offset & buf_size for GL_ARRAY_BUFFER too
      - add whandle.offset to buf_offset
      - disable the minmax cache for GL_TEXTURE_BUFFER

8 years agoglx: implement GLX part of interop interface (v2)
Marek Olšák [Thu, 3 Mar 2016 17:43:53 +0000 (18:43 +0100)]
glx: implement GLX part of interop interface (v2)

v2: - use const

8 years agoegl: implement EGL part of interop interface (v2)
Marek Olšák [Thu, 3 Mar 2016 14:59:48 +0000 (15:59 +0100)]
egl: implement EGL part of interop interface (v2)

v2: - use const

8 years agodri_interface: add interface for GL interop with other APIs (v2)
Marek Olšák [Mon, 29 Feb 2016 22:19:18 +0000 (23:19 +0100)]
dri_interface: add interface for GL interop with other APIs (v2)

v2: - use const

8 years agoinclude/GL: add mesa_glinterop.h for OpenGL-OpenCL interop (v4.2)
Marek Olšák [Fri, 26 Feb 2016 20:21:31 +0000 (21:21 +0100)]
include/GL: add mesa_glinterop.h for OpenGL-OpenCL interop (v4.2)

v2: - use "enum" to define stuff
v3: - more comments, define MESA_GLINTEROP_UNSUPPORTED
v4: - add mesa_glinterop_device_info::interop_version
    - more comments
    - remove #define MESA_GLINTEROP_VERSION
    - use const for "in"
v4.1: - use version numbers for structures
      - add "out_driver_data_written"
v4.2: - buf_offset & buf_size affect GL_ARRAY_BUFFER too, this is required
        for sharing suballocations within a larger buffer

8 years agost/dri: Fix RGB565 EGLImage creation
Nicolas Dufresne [Sun, 17 Apr 2016 00:49:44 +0000 (20:49 -0400)]
st/dri: Fix RGB565 EGLImage creation

When creating egl images we do a bytes to pixel conversion by deviding
by 4 regardless of the pixel format. This does not work for RGB565. In
this patch, we avoid useless conversion and use proper API when the
conversion cannot be avoided.

Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
8 years agost/dri: Factor out DRI2 to PIPE_FORMAT conversion
Nicolas Dufresne [Sun, 17 Apr 2016 00:49:43 +0000 (20:49 -0400)]
st/dri: Factor out DRI2 to PIPE_FORMAT conversion

This code is already duplicated twice and will be useful again. This
will also help when adding formats.

Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
8 years agofreedreno/a4xx: lower srgb in shader for astc textures
Rob Clark [Tue, 19 Apr 2016 13:02:23 +0000 (09:02 -0400)]
freedreno/a4xx: lower srgb in shader for astc textures

This *seems* like a hw bug, and maybe only applies to certain a4xx
variants/revisions.  But setting the SRGB bit in sampler view state
(texconst0) causes invalid alpha for ASTC textures.  Work around this
by doing the srgb->linear conversion in the shader instead.

This fixes 392 dEQP tests: dEQP-GLES3.functional.texture.*astc*srgb*

(The remaining fails seem to be a bug w/ ASTC + linear filtering, also
possibly a420.0 specific.)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agonir/lower-tex: add srgb->linear lowering
Rob Clark [Tue, 19 Apr 2016 12:28:22 +0000 (08:28 -0400)]
nir/lower-tex: add srgb->linear lowering

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agonir/builder: const'ify swiz param
Rob Clark [Tue, 19 Apr 2016 19:44:25 +0000 (15:44 -0400)]
nir/builder: const'ify swiz param

No need for it not to be const, and lets caller declare it const if
desired.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
8 years agonir/lower-tex: make options a local var
Rob Clark [Tue, 19 Apr 2016 11:46:50 +0000 (07:46 -0400)]
nir/lower-tex: make options a local var

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agofreedreno: cleanup fd_set_sampler_views
Rob Clark [Tue, 19 Apr 2016 19:52:18 +0000 (15:52 -0400)]
freedreno: cleanup fd_set_sampler_views

The separate FS/VS entrypoints are no longer used since a3ed98f.  So
just inline them.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agotgsi/lowering: improved lowering for LRP
Russell King [Wed, 13 Apr 2016 22:42:42 +0000 (18:42 -0400)]
tgsi/lowering: improved lowering for LRP

Provide an improved lowering for LRP, which can be implemented in two
MAD instructions with a bit of rearranging of the equation, rather
than the literal implementation of two multiplies, an add and a
subtract.

Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agotgsi/lowering: improved lowering for XPD
Russell King [Wed, 13 Apr 2016 22:42:41 +0000 (18:42 -0400)]
tgsi/lowering: improved lowering for XPD

Improve XPD lowering to consume less instructions by using the
MAD instruction to perform the multiply and subtraction together.

Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agotgsi/lowering: add support for lowering TRUNC
Russell King [Wed, 13 Apr 2016 22:42:40 +0000 (18:42 -0400)]
tgsi/lowering: add support for lowering TRUNC

Add support for lowering TRUNC using the following sequence:

FRC tmpA, |src|
SUB tmpA, |src|, tmpA
CMP dst, -tmpA, tmpA

Note that this is incompatible with FRC lowering.

Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agotgsi/lowering: add support for lowering FLR and CEIL
Russell King [Wed, 13 Apr 2016 22:42:39 +0000 (18:42 -0400)]
tgsi/lowering: add support for lowering FLR and CEIL

Add support for lowering FLR and CEIL to FRC/SUB and FRC/ADD
instructions for GPUs that support FRC but not FLR or CEIL.  Since
these uses FRC, it is invalid to ask for FLR or CEIL to be lowered
along with FRC, so add an assert to catch this invalid configuration.

We also need to deal with FLR instructions emitted by the lowering
code.  Fix these up with the FRC+SUB equivalent when FLR lowering is
enabled.

Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agoradeonsi: enable TGSI support cap for compute shaders
Bas Nieuwenhuizen [Sat, 19 Mar 2016 14:16:50 +0000 (15:16 +0100)]
radeonsi: enable TGSI support cap for compute shaders

v2: Use chip_class instead of family.

v3: Check kernel version for SI.

v4: Preemptively allow amdgpu winsys for SI.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Consider input SGPR count for compute shader SGPR count.
Bas Nieuwenhuizen [Tue, 19 Apr 2016 12:08:13 +0000 (14:08 +0200)]
radeonsi: Consider input SGPR count for compute shader SGPR count.

si_shader_create corrects the SGPR count with si_fix_num_sgprs. We then
recompute the rsrc1 register to use the new SGPR count.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Add CE synchronization for compute dispatches.
Bas Nieuwenhuizen [Tue, 19 Apr 2016 11:52:32 +0000 (13:52 +0200)]
radeonsi: Add CE synchronization for compute dispatches.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agomesa/st: enable compute shaders if images are also supported
Bas Nieuwenhuizen [Sat, 2 Apr 2016 11:39:54 +0000 (13:39 +0200)]
mesa/st: enable compute shaders if images are also supported

v2: Also depend on atomic counters.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: clean up compute flush
Bas Nieuwenhuizen [Sat, 2 Apr 2016 09:37:06 +0000 (11:37 +0200)]
radeonsi: clean up compute flush

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: do not do two full flushes on every compute dispatch
Bas Nieuwenhuizen [Sun, 27 Mar 2016 09:14:34 +0000 (11:14 +0200)]
radeonsi: do not do two full flushes on every compute dispatch

v2: Add more CS_PARTIAL_FLUSH events.

Essentially every place with waits on finishing for pixel shaders
also has a write after read hazard with compute shaders.

Invalidating L2 waits implicitly on pixel and compute shaders,
so, we don't need a CS_PARTIAL_FLUSH for switching FBO.

v3: Add CS_PARTIAL_FLUSH events even if we already have INV_GLOBAL_L2.

According to Marek the INV_GLOBAL_L2 events don't wait for compute
shaders to finish, so wait for them explicitly.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
8 years agoradeonsi: split setting graphics and compute descriptors
Bas Nieuwenhuizen [Sat, 19 Mar 2016 12:56:29 +0000 (13:56 +0100)]
radeonsi: split setting graphics and compute descriptors

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: split texture decompression for compute shaders
Bas Nieuwenhuizen [Sat, 19 Mar 2016 17:41:20 +0000 (18:41 +0100)]
radeonsi: split texture decompression for compute shaders

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: update predicate condition for compute dispatches
Bas Nieuwenhuizen [Tue, 5 Apr 2016 15:38:38 +0000 (17:38 +0200)]
radeonsi: update predicate condition for compute dispatches

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
8 years agoradeonsi: implement TGSI compute dispatch
Bas Nieuwenhuizen [Sat, 19 Mar 2016 14:15:20 +0000 (15:15 +0100)]
radeonsi: implement TGSI compute dispatch

v2: - Use radeon_set_sh_reg_seq.
    - Set predicate bit for conditional rendering.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
8 years agoradeonsi: only emit compute shader state when switching shaders
Bas Nieuwenhuizen [Sat, 2 Apr 2016 11:19:42 +0000 (13:19 +0200)]
radeonsi: only emit compute shader state when switching shaders

v2: - Do check if anything changed earlier
    - Use emitted_program instead of emitted_bo to prevent
      shaders with shader->bo = NULL confusing the check
    - Use radeon_set_sh_reg*

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
8 years agoradeonsi: rework compute scratch buffer
Bas Nieuwenhuizen [Sat, 2 Apr 2016 11:04:18 +0000 (13:04 +0200)]
radeonsi: rework compute scratch buffer

Instead of having a scratch buffer per program, have one per
context.

Also removed the per kernel wave count calculations, but
that only helped if the total number of waves in the dispatch
was smaller than sctx->scratch_waves.

v2: Fix style issue.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: do per cs setup for compute shaders once per cs
Bas Nieuwenhuizen [Sat, 2 Apr 2016 10:35:36 +0000 (12:35 +0200)]
radeonsi: do per cs setup for compute shaders once per cs

Also removes PKT3_CONTEXT_CONTROL as that is already being done
by si_begin_new_cs, when emitting init_config.

v2: - Use radeon_set_sh_reg_seq.
    - Also set COMPUTE_STATIC_THREAD_MGMT_SE2 / SE3 for CIK+

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: don't pass scratch buffer to user SGPRs
Bas Nieuwenhuizen [Sat, 2 Apr 2016 10:48:05 +0000 (12:48 +0200)]
radeonsi: don't pass scratch buffer to user SGPRs

As far as I can see we use relocations for clover too.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: split input upload off from si_launch_grid
Bas Nieuwenhuizen [Sat, 2 Apr 2016 10:04:15 +0000 (12:04 +0200)]
radeonsi: split input upload off from si_launch_grid

Also uses a dynamically allocated buffer using u_upload_alloc.
The old buffer per program approach required serializing all
dispatches of the same program.

v2: - Clarified commit message.
    - Use radeon_set_sh_reg_seq.
    - Also upload input buffer for clover kernels, even when
      input_size is 0, as it contains grid parameters.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
8 years agoradeonsi: implement TGSI compute shader creation
Bas Nieuwenhuizen [Sat, 19 Mar 2016 13:07:33 +0000 (14:07 +0100)]
radeonsi: implement TGSI compute shader creation

v2: Moved scratch_enabled initialization after compile.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: update shader count for compute shaders
Bas Nieuwenhuizen [Sat, 19 Mar 2016 12:54:55 +0000 (13:54 +0100)]
radeonsi: update shader count for compute shaders

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: set maximum work group size based on block size
Bas Nieuwenhuizen [Mon, 28 Mar 2016 01:01:56 +0000 (03:01 +0200)]
radeonsi: set maximum work group size based on block size

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: implement shared atomics
Bas Nieuwenhuizen [Tue, 29 Mar 2016 11:20:26 +0000 (13:20 +0200)]
radeonsi: implement shared atomics

v2: - Use single region
    - Use get_memory_ptr

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
8 years agoradeonsi: implement shared memory load/store
Bas Nieuwenhuizen [Tue, 29 Mar 2016 11:17:40 +0000 (13:17 +0200)]
radeonsi: implement shared memory load/store

v2: - Use single region
    - Combine address calculation

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
8 years agoradeonsi: add shared memory
Bas Nieuwenhuizen [Tue, 29 Mar 2016 15:51:49 +0000 (17:51 +0200)]
radeonsi: add shared memory

Declares the shared memory as a global variable so that
LLVM is aware of it and it does not conflict with passes
like AMDGPUPromoteAlloca.

v2: - Use ctx->i8.
    - Dropped null-check for declare_memory_region.
    - Changed memory region array to single region.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
8 years agoradeonsi: lower compute shader arguments
Bas Nieuwenhuizen [Thu, 17 Mar 2016 13:12:21 +0000 (14:12 +0100)]
radeonsi: lower compute shader arguments

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradeonsi: Use CE for all descriptors.
Bas Nieuwenhuizen [Thu, 10 Mar 2016 20:39:20 +0000 (21:39 +0100)]
radeonsi: Use CE for all descriptors.

v2: Load previous list for new CS instead of re-emitting
    all descriptors.

v3: Do radeon_add_to_buffer_list in si_ce_upload.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agogallium/util: Add u_bit_scan_consecutive_range64.
Bas Nieuwenhuizen [Wed, 13 Apr 2016 21:30:55 +0000 (23:30 +0200)]
gallium/util: Add u_bit_scan_consecutive_range64.

For use by radeonsi.

v2: Make sure that it works for all 64 bits set.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoradeonsi: Replace list_dirty with a mask.
Bas Nieuwenhuizen [Thu, 14 Apr 2016 23:00:41 +0000 (01:00 +0200)]
radeonsi: Replace list_dirty with a mask.

We can then upload only the dirty ones with the constant engine.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>