mesa.git
9 years agomain: Remove interface block array index for doing the name comparison
Samuel Iglesias Gonsalvez [Wed, 21 Oct 2015 12:34:29 +0000 (14:34 +0200)]
main: Remove interface block array index for doing the name comparison

From ARB_program_query_interface spec:

"uint GetProgramResourceIndex(uint program, enum programInterface,
                                   const char *name);
 [...]
 If <name> exactly matches the name string of one of the active resources
 for <programInterface>, the index of the matched resource is returned.
 Additionally, if <name> would exactly match the name string of an active
 resource if "[0]" were appended to <name>, the index of the matched
 resource is returned. [...]"

"A string provided to GetProgramResourceLocation or
 GetProgramResourceLocationIndex is considered to match an active variable
 if:
[...]
   * if the string identifies the base name of an active array, where the
     string would exactly match the name of the variable if the suffix
     "[0]" were appended to the string;
[...]
"

Fixes the following two dEQP-GLES31 tests:

dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array
dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array_single_element

v2:
- Add AoA support (Timothy)
- Apply it too for GetUniformLocation(), GetUniformName() and others
  because ARB_program_interface_query says that they are equivalent
  to GetProgramResourceLocation() and GetProgramResourceName() (Tapani)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
9 years agovc4: Add support for copy propagation with unpack flags present.
Eric Anholt [Thu, 6 Aug 2015 03:05:56 +0000 (20:05 -0700)]
vc4: Add support for copy propagation with unpack flags present.

total instructions in shared programs: 89251 -> 87862 (-1.56%)
instructions in affected programs:     52971 -> 51582 (-2.62%)

9 years agovc4: Rewrite the pack instructions as a MOV with a dst pack flag
Eric Anholt [Mon, 26 Oct 2015 20:45:06 +0000 (13:45 -0700)]
vc4: Rewrite the pack instructions as a MOV with a dst pack flag

Another step in reducing the special-casing of instructions.

9 years agovc4: Move dst pack setup out to a helper function with more asserts.
Eric Anholt [Mon, 26 Oct 2015 21:16:19 +0000 (14:16 -0700)]
vc4: Move dst pack setup out to a helper function with more asserts.

9 years agovc4: Switch the unpack ops to being unpack flags on a mov.
Eric Anholt [Sun, 25 Oct 2015 00:35:03 +0000 (17:35 -0700)]
vc4: Switch the unpack ops to being unpack flags on a mov.

This paves the way for copy propagating our unpacks.  We end up with a
small change on shader-db:

total instructions in shared programs: 89390 -> 89251 (-0.16%)
instructions in affected programs:     19041 -> 18902 (-0.73%)

which appears to be because we no longer convert MOVs for an FMAX dst,
r4.unpack, r4.unpack (instead of the previous MOV dst, r4.unpack), and
this ends up with a slightly better schedule.

9 years agovc4: Drop some confused code about pack/unpack handling.
Eric Anholt [Mon, 26 Oct 2015 20:22:18 +0000 (13:22 -0700)]
vc4: Drop some confused code about pack/unpack handling.

At one point I thought packs and unpacks were in the same field of the
instruction.  They aren't.  These instructions therefore never cause a
pack.

total instructions in shared programs: 89472 -> 89390 (-0.09%)
instructions in affected programs:     15261 -> 15179 (-0.54%)

9 years agovc4: Reduce MOV special-casing in QIR-to-QPU.
Eric Anholt [Mon, 26 Oct 2015 21:07:44 +0000 (14:07 -0700)]
vc4: Reduce MOV special-casing in QIR-to-QPU.

I'm going to introduce some more types of MOV, which also want the elision
of raw MOVs.

9 years agovc4: Fix up the test for whether the unpack can be from r4.
Eric Anholt [Mon, 26 Oct 2015 20:17:33 +0000 (13:17 -0700)]
vc4: Fix up the test for whether the unpack can be from r4.

We can do 16a/16b from float as well.  No difference on shader-db.

9 years agovc4: Don't try to follow MOVs across a pack.
Eric Anholt [Mon, 26 Oct 2015 20:57:57 +0000 (13:57 -0700)]
vc4: Don't try to follow MOVs across a pack.

9 years agovc4: Only copy propagate raw MOVs.
Eric Anholt [Sun, 25 Oct 2015 00:49:03 +0000 (17:49 -0700)]
vc4: Only copy propagate raw MOVs.

No problems being fixed, but needed for the new unpack changes.

9 years agovc4: If a QIR source has an unpack set, print it.
Eric Anholt [Sun, 25 Oct 2015 00:33:30 +0000 (17:33 -0700)]
vc4: If a QIR source has an unpack set, print it.

Not used yet, but will be.

9 years agoglsl: Convert TES gl_PatchVerticesIn into a constant when using a TCS.
Kenneth Graunke [Wed, 29 Jul 2015 01:16:37 +0000 (18:16 -0700)]
glsl: Convert TES gl_PatchVerticesIn into a constant when using a TCS.

When a TCS is present, the TES input gl_PatchVerticesIn is actually a
constant - it's simply the # of output vertices specified by the TCS
layout qualifiers.  So, we can replace the system value with a constant,
which may allow further optimization, and will likely be more efficient.

If the TCS is absent, we can't do this optimization.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years agoi965: Add missing close-parenthesis in error messages
Ian Romanick [Fri, 16 Oct 2015 16:18:24 +0000 (09:18 -0700)]
i965: Add missing close-parenthesis in error messages

Trivial.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agoi965: Fix is-renderable check in intel_image_target_renderbuffer_storage
Ian Romanick [Thu, 15 Oct 2015 19:50:12 +0000 (12:50 -0700)]
i965: Fix is-renderable check in intel_image_target_renderbuffer_storage

Previously we could create a renderbuffer with format
MESA_FORMAT_R8G8B8A8_UNORM, convert that renderbuffer to an EGLImage,
then FAIL to convert the EGLImage back to a renderbuffer because
reasons.  Just use the same check in
intel_image_target_renderbuffer_storage that brw_render_target_supported
uses.

There are more checks in brw_render_target_supported, but I don't think
they are necessary here.  A different approach would be to refactor
brw_render_target_supported to take rb->Format and rb->NumSamples as
parameters (instead of a gl_renderbuffer) and use the new function here.

Fixes:

    ES2-CTS.gtf.GL2ExtensionTests.egl_image.egl_image

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92476
Cc: "10.3 10.4 10.5 10.6 11.0" <mesa-stable@lists.freedesktop.org>
9 years agoglsl: keep track of intra-stage indices for atomics
Timothy Arceri [Mon, 26 Oct 2015 19:58:15 +0000 (06:58 +1100)]
glsl: keep track of intra-stage indices for atomics

This is more optimal as it means we no longer have to upload the same set
of ABO surfaces to all stages in the program.

This also fixes a bug where since commit c0cd5b var->data.binding was
being used as a replacement for atomic buffer index, but they don't have
to be the same value they just happened to end up the same when binding is 0.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Alejandro Piñeiro <apinheiro@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90175

9 years agogallivm: disable f16c when not using AVX
Roland Scheidegger [Mon, 26 Oct 2015 15:44:47 +0000 (16:44 +0100)]
gallivm: disable f16c when not using AVX

f16c intrinsic can only be emitted when AVX is used. So when we disable AVX
due to forcing 128bit vectors we must not use this intrinsic (depending on
llvm version, this worked previously because llvm used AVX even when we didn't
tell it to, however I've seen this fail with llvm 3.3 since
718249843b915decf8fccec92e466ac1a6219934 which seems to have the side effect
of disabling avx in llvm albeit it only touches sse flags really, but
with ea421e919ae6e72e1319fb205c42a6fb53ca2f82 it's now really disabled).
Albeit being able to use AVX with 128bit vectors also would have its uses, the
code as is really was meant to emulate jit code creation for less capable cpus.
v2: add some (ifdefed out) missing de-featuring options for simulating
less capable cpus.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
9 years agost/va: pass picture desc to begin and decode
Julien Isorce [Fri, 23 Oct 2015 12:25:47 +0000 (13:25 +0100)]
st/va: pass picture desc to begin and decode

At least vl_mpeg12_decoder uses the picture
desc in begin_frame and decode_bitstream.

https://bugs.freedesktop.org/show_bug.cgi?id=92634

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
9 years agomesa: add additional checks for uniform location query
Tapani Pälli [Mon, 26 Oct 2015 09:13:14 +0000 (11:13 +0200)]
mesa: add additional checks for uniform location query

Patch adds additional check to make sure we don't return locations for
structures or arrays of structures.

From page 79 of the OpenGL 4.2 spec:
    "A valid name cannot be a structure, an array of structures, or any
    portion of a single vector or a matrix."

v2: use without-array() to simplify code (Timothy)

No Piglit or CTS regressions observed.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
9 years agodocs: add news item and link release notes for 11.0.4
Emil Velikov [Sun, 25 Oct 2015 10:17:08 +0000 (10:17 +0000)]
docs: add news item and link release notes for 11.0.4

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
9 years agodocs: add sha256 checksums for 11.0.4
Emil Velikov [Sun, 25 Oct 2015 10:04:09 +0000 (10:04 +0000)]
docs: add sha256 checksums for 11.0.4

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit ec14e6f8fd05999b482e0785d8cd286042c9c254)

9 years agodocs: add release notes for 11.0.4
Emil Velikov [Sat, 24 Oct 2015 18:34:01 +0000 (19:34 +0100)]
docs: add release notes for 11.0.4

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 31bf24703193cc23961923e01548b1acb2760a93)

9 years agoi965: Make brw_varying_to_offset take a const pointer to the VUE map.
Kenneth Graunke [Sun, 26 Jul 2015 04:29:28 +0000 (21:29 -0700)]
i965: Make brw_varying_to_offset take a const pointer to the VUE map.

It doesn't modify it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agovc4: Fix names of the 16-bit unpacks
Eric Anholt [Wed, 5 Aug 2015 04:23:53 +0000 (21:23 -0700)]
vc4: Fix names of the 16-bit unpacks

They're only f16-to-f32 on a float operation, otherwise they're
i16-to-i32.

9 years agovc4: Don't try to register coalesce into the VPM across non-raw MOVs.
Eric Anholt [Sun, 25 Oct 2015 00:38:26 +0000 (17:38 -0700)]
vc4: Don't try to register coalesce into the VPM across non-raw MOVs.

No known bugs, just something I noticed while updating optimization code
for other changes.

9 years agovc4: Take advantage of the 8888 pack function in pack_unorm_4x8.
Eric Anholt [Sun, 25 Oct 2015 00:04:49 +0000 (17:04 -0700)]
vc4: Take advantage of the 8888 pack function in pack_unorm_4x8.

One instruction instead of four, and it turns out you do this a lot for
the Over operator.

total uniforms in shared programs: 32168 -> 32087 (-0.25%)
uniforms in affected programs:     318 -> 237 (-25.47%)
total instructions in shared programs: 89830 -> 89472 (-0.40%)
instructions in affected programs:     6434 -> 6076 (-5.56%)

9 years agovc4: Fix the test for skipping raw MOVs.
Eric Anholt [Sat, 24 Oct 2015 23:30:30 +0000 (16:30 -0700)]
vc4: Fix the test for skipping raw MOVs.

I don't know what previous test was trying to do, but it dates back to the
first add of vc4_qpu_emit.c.  No change to shader-db.

9 years agoi965: Remove unused devinfo revision
Ben Widawsky [Fri, 23 Oct 2015 21:38:39 +0000 (14:38 -0700)]
i965: Remove unused devinfo revision

I left the function to obtain the revision because it is, and will continue to
be useful in the future. I'd rather not have to dig it up every time we need it.
Comments left at the implementation to say as much.

This was accidentally left here when I moved the early platform support:
commit 28ed1e08e8ba98ebd4ff0b56326372f0df9c73ad
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date:   Fri Aug 7 13:58:37 2015 -0700

    i965/skl: Remove early platform support

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agodocs/index.html: fix typo
Fabio Pedretti [Thu, 15 Oct 2015 08:00:23 +0000 (10:00 +0200)]
docs/index.html: fix typo

Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
9 years agofreedreno: remove unnecessary null checks
Rob Clark [Fri, 23 Oct 2015 17:37:26 +0000 (13:37 -0400)]
freedreno: remove unnecessary null checks

According to piglit/xonotic/neverball/stc, blend/rasterize/zsa state
will always be bound (never null).  And the null checks were in-
consistent anyways, so remove them.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agoradeonsi: Implement DCC fast clear.
Bas Nieuwenhuizen [Fri, 23 Oct 2015 23:47:45 +0000 (01:47 +0200)]
radeonsi: Implement DCC fast clear.

Uses the DCC buffer instead of the CMASK buffer. The ELIMINATE_FAST_CLEAR
still works. Furthermore, with DCC compression we can directly clear
to a limited set of colors such that we do not need a postprocessing step.

v2 Marek: check dcc_buffer && dirty_level_mask in set_sampler_view

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
9 years agogallivm: fix tex offsets with mirror repeat linear
Roland Scheidegger [Thu, 22 Oct 2015 21:58:50 +0000 (23:58 +0200)]
gallivm: fix tex offsets with mirror repeat linear

Can't see why anyone would ever want to use this, but it was clearly broken.
This fixes the piglit texwrap offset test using this combination.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
9 years agogallivm: fix sampling with texture offsets in SoA path
Roland Scheidegger [Thu, 22 Oct 2015 21:49:41 +0000 (23:49 +0200)]
gallivm: fix sampling with texture offsets in SoA path

When using nearest filtering and clamp / clamp to edge wrapping results could
be wrong for negative offsets. Fix this by adding the offset before doing
the conversion to int coords (could also use floor instead of trunc int
conversion but probably more complex on "typical" cpu).

This fixes the piglit texwrap offset failures with this filter/wrap combo
(which only leaves the linear/mirror repeat combination broken).

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
9 years agosoftpipe: fix using non-zero layer in non-array view from array resource
Roland Scheidegger [Thu, 22 Oct 2015 20:28:28 +0000 (22:28 +0200)]
softpipe: fix using non-zero layer in non-array view from array resource

For vertex/geometry shader sampling, this is the same as for llvmpipe - just
use the original resource target.
For fragment shader sampling though (which does not use first-layer based mip
offsets) adjust the sampling code to use first_layer in the non-array cases.
While here also fix up some code which looked wrong wrt buffer texel fetch
(no piglit change).

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
9 years agollvmpipe: fix using non-zero layer in non-array view from array resource
Roland Scheidegger [Thu, 22 Oct 2015 20:26:52 +0000 (22:26 +0200)]
llvmpipe: fix using non-zero layer in non-array view from array resource

Just need to use resource target not view target when calculating
first-layer based mip offsets. (This is a gl specific problem since
d3d10 does not distinguish between non-array and array resources neither
at the resource nor view level, only at the shader level.)
Fixes new piglit arb_texture_view sampling-2d-array-as-2d-layer test.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
9 years agoradeonsi: add Stoney to si_init_gs_info()
Alex Deucher [Fri, 23 Oct 2015 22:31:57 +0000 (18:31 -0400)]
radeonsi: add Stoney to si_init_gs_info()

This patch was originally written before stoney support
was merged.  Add stoney.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
9 years agoradeonsi: Enable DCC.
Bas Nieuwenhuizen [Tue, 20 Oct 2015 22:10:39 +0000 (00:10 +0200)]
radeonsi: Enable DCC.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
9 years agoradeonsi: Add FLUSH_AND_INV_CB_DATA_TS for DCC.
Bas Nieuwenhuizen [Tue, 20 Oct 2015 22:10:38 +0000 (00:10 +0200)]
radeonsi: Add FLUSH_AND_INV_CB_DATA_TS for DCC.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
9 years agoradeonsi: Disable operations that do not work with DCC.
Bas Nieuwenhuizen [Tue, 20 Oct 2015 22:10:37 +0000 (00:10 +0200)]
radeonsi: Disable operations that do not work with DCC.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
9 years agoradeonsi: Allocate buffers for DCC.
Bas Nieuwenhuizen [Tue, 20 Oct 2015 22:10:36 +0000 (00:10 +0200)]
radeonsi: Allocate buffers for DCC.

As the alignment requirements can be 32 KiB or more, also adding
an aligned buffer creation function.

DCC is disabled for textures that can be shared as sharing the
DCC buffers has not been implemented yet.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
9 years agoradeonsi: only apply the SNORM blit workaround to *8_SNORM
Marek Olšák [Thu, 22 Oct 2015 21:36:11 +0000 (23:36 +0200)]
radeonsi: only apply the SNORM blit workaround to *8_SNORM

Like the comment says. This fixes DCC, which doesn't like blitting RG16
as RGBA8.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoutil/format: add helper util_format_is_snorm8
Marek Olšák [Thu, 22 Oct 2015 21:32:16 +0000 (23:32 +0200)]
util/format: add helper util_format_is_snorm8

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: add another requirement for PARTIAL_ES_WAVE
Marek Olšák [Mon, 19 Oct 2015 00:45:56 +0000 (02:45 +0200)]
radeonsi: add another requirement for PARTIAL_ES_WAVE

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: merge two ifs setting WD_SWITCH_ON_EOP
Marek Olšák [Sun, 18 Oct 2015 20:22:22 +0000 (22:22 +0200)]
radeonsi: merge two ifs setting WD_SWITCH_ON_EOP

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: make PARTIAL_ES_WAVE globally dependent on SWITCH_ON_EOI
Marek Olšák [Sun, 18 Oct 2015 20:17:04 +0000 (22:17 +0200)]
radeonsi: make PARTIAL_ES_WAVE globally dependent on SWITCH_ON_EOI

This catches the other cases that enable SWITCH_ON_EOI.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: add one more SWITCH_ON_EOI requirement for Hawaii and VI
Marek Olšák [Sun, 18 Oct 2015 20:07:01 +0000 (22:07 +0200)]
radeonsi: add one more SWITCH_ON_EOI requirement for Hawaii and VI

The VI condition depends on geometry shaders and MAX_PRIMGRP_IN_WAVE.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: only apply the instancing bug workaround to Bonaire
Marek Olšák [Sun, 18 Oct 2015 19:51:41 +0000 (21:51 +0200)]
radeonsi: only apply the instancing bug workaround to Bonaire

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: add SWITCH_ON_EOI requirement for 4 SE parts
Marek Olšák [Sun, 18 Oct 2015 19:43:30 +0000 (21:43 +0200)]
radeonsi: add SWITCH_ON_EOI requirement for 4 SE parts

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: remove unnecessary PARTIAL_VS_WAVE setting for streamout
Marek Olšák [Sun, 18 Oct 2015 19:28:54 +0000 (21:28 +0200)]
radeonsi: remove unnecessary PARTIAL_VS_WAVE setting for streamout

hardware does this automatically

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: allow unbinding vertex shaders
Marek Olšák [Thu, 22 Oct 2015 19:32:23 +0000 (21:32 +0200)]
radeonsi: allow unbinding vertex shaders

Draw calls without a vertex shader are skipped.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: allow unbinding pixel shaders and remove the dummy shader
Marek Olšák [Thu, 22 Oct 2015 20:19:34 +0000 (22:19 +0200)]
radeonsi: allow unbinding pixel shaders and remove the dummy shader

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: add draw_vbo check for a NULL pixel shader
Marek Olšák [Thu, 22 Oct 2015 20:18:49 +0000 (22:18 +0200)]
radeonsi: add draw_vbo check for a NULL pixel shader

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: add checks for a NULL pixel shader
Marek Olšák [Thu, 22 Oct 2015 20:17:28 +0000 (22:17 +0200)]
radeonsi: add checks for a NULL pixel shader

This will allow removing the dummy PS.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agogallium/util: add a test for NULL fragment shaders
Marek Olšák [Thu, 22 Oct 2015 20:14:53 +0000 (22:14 +0200)]
gallium/util: add a test for NULL fragment shaders

Just to validate that radeonsi doesn't crash.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agost/mesa: don't load state parameters if there are none
Marek Olšák [Thu, 22 Oct 2015 17:46:07 +0000 (19:46 +0200)]
st/mesa: don't load state parameters if there are none

Out of 7063 shaders from my shader-db:
- 6564 (93%) shaders don't have any state parameters.
- 347 (5%) shaders have 1 state parameter for WPOS lowering.
- The remaining 2% have more state parameters, usually matrices.

Reviewed-by: Brian Paul <brianp@vmware.com>
9 years agoradeonsi: add Stoney pci ids
Samuel Li [Thu, 22 Oct 2015 16:06:43 +0000 (12:06 -0400)]
radeonsi: add Stoney pci ids

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Samuel Li <samuel.li@amd.com>
Cc: mesa-stable@lists.freedesktop.org
9 years agoradeonsi: add support for Stoney asics (v3)
Samuel Li [Fri, 21 Aug 2015 19:35:46 +0000 (15:35 -0400)]
radeonsi: add support for Stoney asics (v3)

v2 (agd): rebase on mesa master, split pci ids to
separate commit
v3 (agd): use carrizo for llvm processor name for
llvm 3.7 and older

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Samuel Li <samuel.li@amd.com>
Cc: mesa-stable@lists.freedesktop.org
9 years agonvc0: respect edgeflag attribute width
Ilia Mirkin [Fri, 23 Oct 2015 06:14:31 +0000 (02:14 -0400)]
nvc0: respect edgeflag attribute width

The edgeflag comes in as ubyte with glEdgeFlagPointer but as float with
plain immediate glEdgeFlag. Avoid reading bytes that weren't meant for
the edgeflag in the pointer case.

Fixes intermittent failures with gl-2.0-edgeflag piglit (and valgrind
complaints about reading uninitialized memory).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
9 years agogallivm: Explicitly disable unsupported CPU features.
Jose Fonseca [Fri, 23 Oct 2015 10:40:25 +0000 (11:40 +0100)]
gallivm: Explicitly disable unsupported CPU features.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92214
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
9 years agovc4: Convert blending to being done in 4x8 unorm normally.
Eric Anholt [Wed, 19 Aug 2015 16:38:14 +0000 (09:38 -0700)]
vc4: Convert blending to being done in 4x8 unorm normally.

We can't do this all the time, because you want blending to be done in
linear space, and sRGB would lose too much precision being done in 4x8.
The win on instructions is pretty huge when you can, though.

total uniforms in shared programs: 32065 -> 32168 (0.32%)
uniforms in affected programs:     327 -> 430 (31.50%)
total instructions in shared programs: 92644 -> 89830 (-3.04%)
instructions in affected programs:     15580 -> 12766 (-18.06%)

Improves openarena performance at 1920x1080 from 10.7fps to 11.2fps.

9 years agovc4: Add QIR/QPU support for the 8-bit vector instructions.
Eric Anholt [Fri, 9 Jan 2015 02:07:15 +0000 (18:07 -0800)]
vc4: Add QIR/QPU support for the 8-bit vector instructions.

9 years agovc4: Don't try to CSE non-SSA instructions.
Eric Anholt [Fri, 23 Oct 2015 15:34:01 +0000 (16:34 +0100)]
vc4: Don't try to CSE non-SSA instructions.

This can happen when we're doing destination packing -- we don't know
what's in the rest of the register.

Signed-off-by: Eric Anholt <eric@anholt.net>
9 years agonir: Add opcodes for saturated vector math.
Eric Anholt [Wed, 19 Aug 2015 05:38:34 +0000 (22:38 -0700)]
nir: Add opcodes for saturated vector math.

This corresponds to instructions used on vc4 for its blending inside of
shaders.  I've seen these opcodes on other architectures before, but I
think it's the first time these are needed in Mesa.

v2: Rename to 'u' instead of 'i', since they're all 'u'norm (from review
    by jekstrand)

9 years agovc4: Add dumping of VC4_PACKET_GL_INDEXED_PRIMITIVE.
Eric Anholt [Fri, 23 Oct 2015 14:26:12 +0000 (15:26 +0100)]
vc4: Add dumping of VC4_PACKET_GL_INDEXED_PRIMITIVE.

9 years agovc4: Add a workaround for HW-2116 (state counter wrap fails).
Eric Anholt [Fri, 23 Oct 2015 13:43:41 +0000 (14:43 +0100)]
vc4: Add a workaround for HW-2116 (state counter wrap fails).

I haven't proven that this happens (I've got other GPU hangs in the
way), but the closed driver also does this and it's documented as an
errata.

9 years agovc4: Fix missing \n in a perf_debug().
Eric Anholt [Fri, 23 Oct 2015 13:41:47 +0000 (14:41 +0100)]
vc4: Fix missing \n in a perf_debug().

9 years agoi965/fs: Allow copy propagating into new surface access opcodes
Kristian Høgsberg Kristensen [Wed, 21 Oct 2015 06:22:27 +0000 (23:22 -0700)]
i965/fs: Allow copy propagating into new surface access opcodes

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
9 years agoi965/fs: Optimize ssbo stores
Kristian Høgsberg Kristensen [Thu, 22 Oct 2015 06:43:34 +0000 (23:43 -0700)]
i965/fs: Optimize ssbo stores

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Write groups of enabled components together.

Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
9 years agoi965/fs: Drop offset_reg temporary in ssbo load
Kristian Høgsberg Kristensen [Thu, 22 Oct 2015 05:49:14 +0000 (22:49 -0700)]
i965/fs: Drop offset_reg temporary in ssbo load

Now that we don't read each component one-by-one, we don't need the
temoprary vgrf for the offset. More importantly, this register was type
UD while the nir source was type D. This broke copy propagation and left
a redundant MOV in the generated code.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
9 years agoi965/fs: Avoid scalar destinations in emit_uniformize()
Kristian Høgsberg Kristensen [Wed, 21 Oct 2015 06:31:49 +0000 (23:31 -0700)]
i965/fs: Avoid scalar destinations in emit_uniformize()

The scalar destination registers break copy propagation. Instead compute
the results to a regular register and then reference a component when we
later use the result as a source.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
9 years agoi965/fs: Don't uniformize surface index twice
Kristian Høgsberg Kristensen [Wed, 21 Oct 2015 06:16:51 +0000 (23:16 -0700)]
i965/fs: Don't uniformize surface index twice

The emit_untyped_read and emit_untyped_write helpers already uniformize
the surface index argument. No need to do it before calling them.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
9 years agoi965/fs: Use unsigned immediate 0 when eliminating SHADER_OPCODE_FIND_LIVE_CHANNEL
Kristian Høgsberg Kristensen [Wed, 21 Oct 2015 06:07:56 +0000 (23:07 -0700)]
i965/fs: Use unsigned immediate 0 when eliminating SHADER_OPCODE_FIND_LIVE_CHANNEL

The destination for SHADER_OPCODE_FIND_LIVE_CHANNEL is always a UD
register.  When we replace the opcode with a MOV, make sure we use a UD
immediate 0 so copy propagation doesn't bail because of non-matching
types.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
9 years agoi965/fs: Read all components of a SSBO field with one send
Kristian Høgsberg Kristensen [Sat, 17 Oct 2015 04:58:14 +0000 (21:58 -0700)]
i965/fs: Read all components of a SSBO field with one send

Instead of looping through single-component reads, read all components
in one go.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
9 years agoi965: Don't use message headers for untyped reads
Kristian Høgsberg Kristensen [Sat, 17 Oct 2015 04:38:05 +0000 (21:38 -0700)]
i965: Don't use message headers for untyped reads

We always set the mask to 0xffff, which is what it defaults to when no
header is present. Let's drop the header instead.

v2: Only remove header for untyped reads. Typed reads always need the
    header.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
9 years agoi965/vec4: check opcode on vec4_instruction::reads_flag(channel)
Alejandro Piñeiro [Fri, 23 Oct 2015 13:32:30 +0000 (15:32 +0200)]
i965/vec4: check opcode on vec4_instruction::reads_flag(channel)

Commit f17b78 added an alternative reads_flag(channel) that returned
if the instruction was reading a specific channel flag. By mistake it
only took into account the predicate, but when the opcode is
VS_OPCODE_UNPACK_FLAGS_SIMD4X2 there isn't any predicate, but the flag
are used.

That mistake caused some regressions on old hw. More information on
this bug:
https://bugs.freedesktop.org/show_bug.cgi?id=92621

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agovc4: Use Rob's NIR-based user clip lowering.
Eric Anholt [Wed, 21 Oct 2015 15:40:46 +0000 (16:40 +0100)]
vc4: Use Rob's NIR-based user clip lowering.

9 years agovc4: Also dump the decimation mode for resolved stores.
Eric Anholt [Mon, 10 Aug 2015 18:28:53 +0000 (11:28 -0700)]
vc4: Also dump the decimation mode for resolved stores.

9 years agovc4: Use VC4_GET_FIELD and other defines in dumping VC4_RENDER_CONFIG.
Eric Anholt [Mon, 10 Aug 2015 18:27:21 +0000 (11:27 -0700)]
vc4: Use VC4_GET_FIELD and other defines in dumping VC4_RENDER_CONFIG.

9 years agovc4: Add a sentinel after simulator buffers for buffer overflow detection.
Eric Anholt [Thu, 22 Oct 2015 10:31:56 +0000 (11:31 +0100)]
vc4: Add a sentinel after simulator buffers for buffer overflow detection.

This is a little bit like the mprotect-based fencing I've experimented
with, but it's simple and low overhead.  The downside is that only catches
writes, not reads.

It didn't catch any bad writes on a current piglit run, but may be useful
in the future.

9 years agoglsl: fix shader storage block member rules when adding program resources
Samuel Iglesias Gonsalvez [Wed, 21 Oct 2015 07:46:48 +0000 (09:46 +0200)]
glsl: fix shader storage block member rules when adding program resources

Commit f24e5e did not take into account arrays of named shader
storage blocks.

Fixes 20 dEQP-GLES31.functional.ssbo.* tests:

dEQP-GLES31.functional.ssbo.layout.single_struct_array.per_block_buffer.shared_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.per_block_buffer.packed_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.per_block_buffer.std140_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.per_block_buffer.std430_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.single_buffer.shared_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.single_buffer.packed_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.single_buffer.std140_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.single_buffer.std430_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.per_block_buffer.shared_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.per_block_buffer.packed_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.per_block_buffer.std140_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.per_block_buffer.std430_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.single_buffer.shared_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.single_buffer.packed_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.single_buffer.std140_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.single_buffer.std430_instance_array
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.2
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.29
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.33
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.3

V2:
- Rename some variables (Timothy)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
9 years agoilo: add support for scratch spaces
Chia-I Wu [Thu, 22 Oct 2015 16:45:49 +0000 (00:45 +0800)]
ilo: add support for scratch spaces

When a kernel reports a non-zero per-thread scratch space size, make sure the
hardware state is correctly set up, and a scratch bo is allocated.

9 years agoilo: fix scratch space setup in core
Chia-I Wu [Thu, 22 Oct 2015 16:24:26 +0000 (00:24 +0800)]
ilo: fix scratch space setup in core

Move scratch_size out of ilo_state_shader_kernel_info and
ilo_state_compute_interface_info.  A scratch space is shared by all
kernels/interfaces.  Update builder to emit relocs for scratch bos.

9 years agoglsl: remove excess location qualifier validation
Timothy Arceri [Thu, 22 Oct 2015 05:18:44 +0000 (16:18 +1100)]
glsl: remove excess location qualifier validation

Location has never been able to be a negative value because it has
always been validated in the parser.

Also the linker doesn't check for negatives like the comment claims.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
9 years agodocs: update relnotes to mention virgl driver.
Dave Airlie [Fri, 23 Oct 2015 02:11:43 +0000 (12:11 +1000)]
docs: update relnotes to mention virgl driver.

9 years agovirgl/vtest: add vtest driver
Dave Airlie [Fri, 13 Mar 2015 04:15:47 +0000 (14:15 +1000)]
virgl/vtest: add vtest driver

virgl/vtest is a swrast driver that allows the
virgl acceleration to be tested without having
a virtual machine.

The backend has a unix socket server that
this connects to.

This is run by setting
LIBGL_ALWAYS_SOFTWARE=y
GALLIUM_DRIVER=virpipe

In this mode all renderering is sent over
a socket to the remote renderer, and the
results are readback and copies to the screen
using drisw. This works well enough to develop
new features and to help debug.

Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agovirgl: add driver for virtio-gpu 3D (v2)
Dave Airlie [Thu, 22 Jan 2015 05:11:47 +0000 (15:11 +1000)]
virgl: add driver for virtio-gpu 3D (v2)

virgl is the 3D acceleration backend for the
virtio-gpu shipping with qemu.

The 3D acceleration is designed around gallium
and TGSI as the virtualisation layer. The backend
renderer translates the virgl interface into
OpenGL currently.

This is the initial import of the driver to mesa.

The kernel driver portions are lined up for drm-next.

Currently this driver supports up to GL3.3 and some
misc extensions if the host driver exposes it. It is
planned to iterate the virgl API to new GL levels
as mesa host drivers gain features.

v2: fix resource tracking across flushes to avoid
->bind hack in mapping.
consolidate mapping and waiting code for transfers.
use u_range for dirt tracking.
handle larger shaders in protocol.
include virtgpu_drm.h in mesa for now.
add translation layer for gallium tgsi to virgl tgsi.

Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agotgsi: try and handle overflowing shaders. (v2)
Dave Airlie [Thu, 22 Jan 2015 05:18:05 +0000 (15:18 +1000)]
tgsi: try and handle overflowing shaders. (v2)

This is used to detect error in virgl if we overflow the shader
dumping buffers.

v2: return a bool.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agotgsi: add option to dump floats as hex values
Dave Airlie [Mon, 16 Sep 2013 00:13:55 +0000 (10:13 +1000)]
tgsi: add option to dump floats as hex values

This adds support to the parser to accept hex values as floats,
and then adds support to the dumper to allow the user to select
to dump float as 32-bit hex numbers.

This is required to get accurate values for virgl use of TGSI.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agosvga: Condition preemptive flush on draw emission
Sinclair Yeh [Tue, 13 Oct 2015 19:58:26 +0000 (12:58 -0700)]
svga: Condition preemptive flush on draw emission

On ultra high resolution modes, the preemptive flush flag can be
set midway through command submission, a condition that cannot be
recovered from a flush-retry, causing rendering artifacts.

This patch prevents a preemtive_flush until a draw has been
emitted.

Signed-off-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
9 years agosvga: try to avoid index generation for some primitive types
Brian Paul [Fri, 16 Oct 2015 22:14:46 +0000 (16:14 -0600)]
svga: try to avoid index generation for some primitive types

The svga device doesn't directly support quads, quad strips or polygons
so we have to convert those types to indexed triangle lists.  But we
can sometimes avoid that if we're drawing flat/constant-colored prims
and we don't have to worry about provoking vertex.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
9 years agosvga: avoid provoking vertex conversion when possible
Brian Paul [Fri, 16 Oct 2015 22:12:19 +0000 (16:12 -0600)]
svga: avoid provoking vertex conversion when possible

Provoking vertex comes into play when doing flat shading.  But if we know
that all fragments in a primitive are the same color, the provoking vertex
doesn't matter.  Check for that case and use whichever provoking vertex
convention is supported by the device.

This avoids generating an index buffer to do the PV conversion.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
9 years agosvga: detect constant color writes in fragment shaders
Brian Paul [Thu, 22 Oct 2015 21:36:25 +0000 (15:36 -0600)]
svga: detect constant color writes in fragment shaders

Examine the fragment shader to try to detect TGSI shaders which use
"MOV OUT[0], CONST[i]" to write a constant value for the fragment color.
In this case, all fragments will have the same color (unless blending is
enabled).

This is a common case for OpenGL code such as: glColor(), glBegin(),
glVertex(), ..., glEnd() when lighting/fog/etc are disabled.  In this
case, the Mesa/gallium state tracker actually generates a simple
"MOV OUT[0], CONST[i]" fragment shader.

This will be used by the next commit to avoid provoking vertex conversion
(creating/rewriting an index buffer) when drawing flat-shaded primitives.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
9 years agomesa: check for unchanged line width before error checking
Brian Paul [Wed, 21 Oct 2015 19:58:04 +0000 (13:58 -0600)]
mesa: check for unchanged line width before error checking

Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agost/mesa: use _mesa_RasterPos() when possible
Brian Paul [Wed, 21 Oct 2015 19:42:37 +0000 (13:42 -0600)]
st/mesa: use _mesa_RasterPos() when possible

The st_RasterPos() function goes to great pains to implement the
rasterpos transformation.  It basically uses gallium's draw module to
execute the vertex shader to draw a point, then capture that point's
attributes.

But glRasterPos isn't typically used with a vertex shader so we can
usually use the old/fixed-function implementation which is a lot simpler
and faster.

This can add up for legacy apps that make a lot of calls to glRasterPos.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
9 years agotnl: remove t_rasterpos.c
Brian Paul [Wed, 21 Oct 2015 19:42:21 +0000 (13:42 -0600)]
tnl: remove t_rasterpos.c

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
9 years agodrivers/common: use _mesa_RasterPos instead of _tnl_RasterPos
Brian Paul [Wed, 21 Oct 2015 19:40:58 +0000 (13:40 -0600)]
drivers/common: use _mesa_RasterPos instead of _tnl_RasterPos

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
9 years agomesa: copy rasterpos evaluation code into core Mesa
Brian Paul [Wed, 21 Oct 2015 19:39:15 +0000 (13:39 -0600)]
mesa: copy rasterpos evaluation code into core Mesa

We'll remove it from the tnl module next.  By lifting this code into core
Mesa we can use it from the gallium state tracker.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
9 years agovbo: optimize vertex copying when 'wrapping'
Brian Paul [Wed, 21 Oct 2015 19:38:23 +0000 (13:38 -0600)]
vbo: optimize vertex copying when 'wrapping'

Instead of calling memcpy() 'n' times, we can do it all at once since
the source and dest regions are all contiguous.

Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoradeon/uvd: don't expose HEVC on old UVD hw (v3)
Alex Deucher [Thu, 22 Oct 2015 16:24:42 +0000 (12:24 -0400)]
radeon/uvd: don't expose HEVC on old UVD hw (v3)

The section for UVD 2 and older was not updated
when HEVC support was added. Reported by Kano
on irc.

v2: integrate the UVD2 and older checks into the
main switch statement.
v3: handle encode checking as well.  Encode is
already checked in the top case statement, so
drop encode checks in the lower case statement.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: mesa-stable@lists.freedesktop.org
9 years agoi965/vec4: print predicate control at brw_vec4 dump_instruction
Alejandro Piñeiro [Fri, 9 Oct 2015 16:39:42 +0000 (18:39 +0200)]
i965/vec4: print predicate control at brw_vec4 dump_instruction

v2: externalize pred_ctrl_align16 from brw_disasm.c instead of adding
    a copy on brw_vec4.c, as suggested by Matt Turner

Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoi965/vec4: use an envvar to decide to print the assembly on cmod_propagation tests
Alejandro Piñeiro [Thu, 1 Oct 2015 14:41:30 +0000 (16:41 +0200)]
i965/vec4: use an envvar to decide to print the assembly on cmod_propagation tests

The complete way to do this would be parse INTEL_DEBUG and
print the output if DEBUG_VS (or a new one) is present
(see intel_debug.c).

But that seems like an overkill for the unit tests, that
after all, the most common use case is being run when
calling make check.

v2: use the same idea for the fs counterpart too, as suggested by
    Matt Turner

Reviewed-by: Matt Turner <mattst88@gmail.com>