Tom Stellard [Wed, 10 Dec 2014 01:05:44 +0000 (20:05 -0500)]
radeonsi/compute: Use relocs for scratch pointer rather than user sgprs v2
Instead of passing a pointer to the scratch buffer via user sgprs, we
now patch the shader with the buffer address using reloc information
from the LLVM generated ELF.
v2:
- Make sure not to break older LLVM.
Tom Stellard [Wed, 10 Dec 2014 01:03:50 +0000 (20:03 -0500)]
radeon: Teach radeon_elf_read() how to parse reloc information v3
v2:
- Use strdup for copying reloc names.
- Free reloc memory.
v3:
- Add free_relocs parameter to radeon_shader_binary_free_members()
Tom Stellard [Wed, 14 Jan 2015 15:01:29 +0000 (10:01 -0500)]
radeon: Add a helper function for freeing members of radeon_shader_binary
Kenneth Graunke [Sun, 18 Jan 2015 07:21:15 +0000 (23:21 -0800)]
i965: Work around mysterious Gen4 GPU hangs with minimal state changes.
Gen4 hardware appears to GPU hang frequently when using Chromium, and
also when running 'glmark2 -b ideas'. Most of the error states contain
3DPRIMITIVE commands in quick succession, with very few state packets
between them - usually VERTEX_BUFFERS/ELEMENTS and CONSTANT_BUFFER.
I trimmed an apitrace of the glmark2 hang down to two draw calls with a
glUniformMatrix4fv call between the two. Either draw by itself works
fine, but together, they hang the GPU. Removing the glUniform call
makes the hangs disappear. In the hardware state, this translates to
removing the CONSTANT_BUFFER packet between the two 3DPRIMITIVE packets.
Flushing before emitting CONSTANT_BUFFER packets also appears to make
the hangs disappear. I observed a slowdown in glxgears by doing it all
the time, so I've chosen to only do it when BRW_NEW_BATCH and
BRW_NEW_PSP are unset (i.e. we haven't done a CS_URB_STATE change or
already flushed the whole pipeline).
I'd much rather understand the problem, but at this point, I don't see
how we'd ever be able to track it down further. We have no real tools,
and the hardware people moved on years ago. I've analyzed 20+ error
states and read every scrap of documentation I could find.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80568
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85367
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Kenneth Graunke [Fri, 16 Jan 2015 09:40:33 +0000 (01:40 -0800)]
i965/nir: Enable SIMD16 support in the NIR FS backend.
With the previous commits in place, it just works.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Fri, 16 Jan 2015 21:16:18 +0000 (13:16 -0800)]
i965/nir: Use offset() instead of altering reg_offset directly.
offset() properly handles reg_width, so it'll work for SIMD16.
While we're in the area, simplify a few cases, and use retype() to cut a
few more lines of code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Fri, 16 Jan 2015 10:12:17 +0000 (02:12 -0800)]
i965/nir: Replace fs_reg(GRF, virtual_grf_alloc(...)) with vgrf(...).
brw_fs_nir.cpp creates almost all of its registers via:
fs_reg reg = fs_reg(GRF, virtual_grf_alloc(num_components));
When we add SIMD16 support, we'll need to set reg->width = 16 and
double the VGRF size...on pretty much every VGRF it allocates.
This patch replaces that pattern with a new "vgrf" helper method:
fs_reg reg = vgrf(num_components);
The new function correctly takes reg_width into account. For now,
reg_width is always 1, so this should have no functional change.
v2: Just make vgrf() account for reg_width right away, rather than
changing the behavior in the next patch.
v3: Replace one last virtual_grf_alloc I missed. It's used in code
that only runs for dispatch_width == 8, so it doesn't matter,
but consistency is nice.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Kenneth Graunke [Fri, 16 May 2014 09:21:51 +0000 (02:21 -0700)]
i965: Replace fs_reg(fs_visitor, type) with fs_visitor::vgrf(type).
I dislike how fs_reg has a constructor that knows about fs_visitor.
Apart from that, it stands alone, with no need to interact with the
rest of the compiler. Which is sensible - a class that represents
a register should do just that. Allocating virtual register numbers
should be left up to the compiler (fs_visitor).
This patch replaces the constructor with a new fs_visitor::vgrf method,
eliminating fs_reg's dependency on fs_visitor. It ends up being no
more code.
v2: Rebase from May 2014 -> January 2015.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Marek Olšák [Sun, 11 Jan 2015 17:38:03 +0000 (18:38 +0100)]
st/mesa: don't set vs.key.clamp_color if a shader doesn't write any colors
And update some comments.
Marek Olšák [Mon, 12 Jan 2015 13:35:24 +0000 (14:35 +0100)]
winsys/radeon: increase the size of buffer cache
This should fix this performance regression:
https://bugs.freedesktop.org/show_bug.cgi?id=88227
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Carl Worth [Mon, 19 Jan 2015 18:49:41 +0000 (10:49 -0800)]
Rename sha1.c and sha1.h to mesa-sha1.c and mesa-sha1.h
The filename of sha1.h was conflicting with the system-provided
sha1.h, (and in some confiurations, our sha1.c was unsuccessfully
attemping to include "sha1.h" and <sha1.h> as two different files).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88523
Martin Peres [Mon, 19 Jan 2015 08:52:05 +0000 (10:52 +0200)]
mesa: fix a trivial spelling mistake
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tapani Pälli [Fri, 16 Jan 2015 10:48:43 +0000 (12:48 +0200)]
mesa: support GL_RGB for GL_EXT_texture_type_2_10_10_10_REV
Commit
8ec6534 changed texture upload path and the way how texture
format is being checked, this commit adds support for GL_RGB with
GL_UNSIGNED_INT_2_10_10_10_REV as specified by the extension
EXT_texture_type_2_10_10_10_REV specification.
This fixes regression in ES3 conformance test
ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels
v2: add MESA_FORMAT_R10G10B10X2_UNORM format (Iago Toral)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88385
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Micah Fedke [Wed, 31 Dec 2014 20:16:52 +0000 (14:16 -0600)]
mesa: Add ARB_shader_precision infrastructure
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Sat, 17 Jan 2015 09:01:35 +0000 (01:01 -0800)]
i965/fs: Fix the dummy fragment shader.
We hit an assertion that the destination of the FB write should not be
an immediate. (I don't know what we were thinking.) Use ARF null.
Trying to substitute real shaders with the dummy shader would crash
when trying to upload non-existent uniforms. Say there are none.
It also wouldn't generate any code because we didn't compute the CFG,
and code generation now requires it. Compute it.
Gen4-5 also require a message header to be present.
On Gen6+, there were assertion failures in SF/SBE state because
urb_setup was memset to 0 instad of -1, causing it to think there were
attributes when nothing was set up right. Set to no attributes.
Finally, you have to ensure "Setup URB Entry Read Length" is non-zero
or you get GPU hangs, at least on Crestline.
It now works on at least Crestline and Haswell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Kristian Høgsberg [Sat, 17 Jan 2015 05:54:54 +0000 (21:54 -0800)]
gbm: Define _DEFAULT_SOURCE to avoid warning
glibc 2.19 introduced _DEFUAULT_SOURCE as a replacement for _BSD_SOURCE,
and deprecates _BSD_SOURCE with an annoying warning. Defining both is
how you're supposed to transition so let's do that. It gets rid of the
warning and we can figure out when/if we can drop _BSD_SOURCE later.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Vinson Lee [Sat, 17 Jan 2015 00:21:41 +0000 (16:21 -0800)]
sha1: Fix gcry_md_hd_t typo.
Fix build error.
CC libmesautil_la-sha1.lo
sha1.c: In function '_mesa_sha1_final':
sha1.c:210:22: error: 'grcy_md_hd_t' undeclared (first use in this function)
gcry_md_hd_t h = (grcy_md_hd_t) ctx;
^
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88519
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Vinson Lee [Sat, 17 Jan 2015 00:14:51 +0000 (16:14 -0800)]
nir: s/malloc.h/stdlib.h/
Fix build error on Mac OS X.
CC nir_to_ssa.lo
nir_to_ssa.c:29:10: fatal error: 'malloc.h' file not found
^
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88478
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Kristian Høgsberg [Fri, 16 Jan 2015 22:42:27 +0000 (14:42 -0800)]
i965: Fix up too-wide comment
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Kristian Høgsberg [Fri, 16 Jan 2015 22:29:40 +0000 (14:29 -0800)]
gbm/dri: Fix const confusion
The driver name is no longer const, it's always allocated dynamically
one way or another. Drop const from dri_screen_create_dri2
driver_name argument to avoid warning.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Carl Worth [Wed, 14 Jan 2015 23:53:00 +0000 (15:53 -0800)]
configure: Add machinery for --enable-shader-cache (and --disable-shader-cache)
We don't actually have the code for the shader cache just yet, but
this configure machinery puts everything in place so that the shader
cache can be optionally compiled in.
Specifically, if the user passes no option (neither
--disable-shader-cache, nor --enable-shader-cache), then this feature
will be automatically detected based on the presence of a usable SHA-1
library. If no suitable library can be found, then the shader cache
will be automatically disabled, (and reported in the final output from
configure).
The user can force the shader-cache feature to not be compiled, (even
if a SHA-1 library is detected), by passing
--disable-shader-cache. This will prevent the compiled Mesa libraries
from depending on any library for SHA-1 implementation.
Finally, the user can also force the shader cache on with
--enable-shader-cache. This will cause configure to trigger a fatal
error if no sutiable SHA-1 implementation can be found for the
shader-cache feature.
Bug fix by José Fonseca <jfonseca@vmware.com>: Fix to put conditional
assignment in Makefile.am, not Makefile.sources to avoid breaking
scons build.
Note: As recommended by José, with this commit the scons build will
not compile any of the SHA-1-using code. This is waiting for someone
to write SConstruct detection of the available SHA-1 libraries, (and
set the appropriate HAVE_SHA1_* variables).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Carl Worth [Fri, 12 Dec 2014 21:55:30 +0000 (13:55 -0800)]
mesa: Add mesa SHA-1 functions
The upcoming shader cache uses the SHA-1 algorithm for cryptographic
naming. These new mesa_sha1 functions are implemented with any one of
several differeny cryptographics libraries.
This code was copied from the xserver repository, (where it has
apparently been functioning well on a variety of operating systems),
and comes licensed with a license identical to that of Mesa.
Bug fixes by José Fonseca <jfonseca@vmware.com>: Fix to put
conditional assignment in Makefile.am, not Makefile.sources to avoid
breaking scons build. Fix include file for CryptoAPI section. Fix
missing cast in openssl section.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Carl Worth [Thu, 11 Dec 2014 22:33:44 +0000 (14:33 -0800)]
configure: Add copyright and license block to configure.ac
Prior to copying in code from the xserver configure.ac file, it makes
sense to have the license of this file clearly marked, (to show that
it's licensed identically to the configure.ac file from the xserver
repository).
And since the text of the license refers to "the above copyright
notice" it also makes sense to have an actual copyright attribution in
place.
I generated this list of names by looking at the output of:
git shortlog -n --format=%aD -- configure.ac
(and arbitrarily stopping for contributors with fewer than 15
commits). Then for each name, I looked for existing Copyright
attributions in the mesa source tree with the same name, (and using
"Intel Corporation" as the copyright holder where I knew that was
appropriate).
Carl Worth [Mon, 15 Dec 2014 23:58:34 +0000 (15:58 -0800)]
glsl: Add unit tests for blob.c
In addition to exercising all of the functions in blob.h, this
includes a stress test that forces some reallocing, and also tests to
verify the alignment and overrun-detection code in blob.c.
Tapani Pälli [Thu, 13 Nov 2014 07:16:51 +0000 (23:16 -0800)]
glsl: Add blob_overwrite_bytes and blob_overwrite_uint32
These functions are useful when serializing an unknown number of items
to a blob. The caller can first save the current offset, write a
placeholder uint32, write out (and count) the items, then use
blob_overwrite_uint32 with the saved offset to replace the placeholder
value.
Then, when deserializing, the reader will first read the count and
know how many subsequent items to expect.
(I wrote this code after reading a very similar patch written by
Tapani when he wrote serialization code for IR. Since I re-used the
idea of his code so directly, I've credited him as the author of this
code. --Carl)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Carl Worth [Thu, 4 Dec 2014 22:16:47 +0000 (14:16 -0800)]
glsl: Add blob.c---a simple interface for serializing data
This new interface allows for writing a series of objects to a chunk
of memory (a "blob").. The allocated memory is maintained within the
blob itself, (and re-allocated by doubling when necessary).
There are also functions for reading objects from a blob as well. If
code attempts to read beyond the available memory, the read functions
return 0 values (or its moral equivalent) without reading past the
allocated memory. Once the caller is done with the reads, it can check
blob->overrun to ensure whether any invalid values were previously
returned due to attempts to read too far.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tapani Pälli [Mon, 2 Jun 2014 12:05:51 +0000 (15:05 +0300)]
mesa: Add iterate method for string_to_uint_map
The upcoming shader cache needs this to be able to cache hash data
from the gl_shader_program structure.
Edited-by: Carl Worth <cworth@cworth.org>:
There is an internal implementation detail that the hash table
underlying the struct string_to_uint_map stores each value internally
as (value+1). The user needn't be very concerned with this (other than
knowing that a value of UINT_MAX cannot be stored) since put() adds 1
and get() subtracts 1.
So in this commit, rather than call the user's function directly with
hash_table_call_foreach, we call through a wrapper that fixes up the
off-by-one values before the caller's callback sees them.
And with this wrapper in place, we also give a better signature to the
callback function being passed to iterate(), so that this callback
function can actually expect a char* and an unsigned argument, (rather
than a couple of void* ).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Carl Worth [Fri, 5 Dec 2014 16:05:44 +0000 (08:05 -0800)]
util: Make unreachable at least be an assert
Previously, if __builtin_unreachable() was unavailable, the
unreachable macro was defined to do nothing. We do better here, by at
least still making it an assert.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Carl Worth [Wed, 22 Oct 2014 23:58:26 +0000 (16:58 -0700)]
glsl: Add convenience function get_sampler_instance
This is similar to the existing functions get_instance,
get_array_instance, etc. for getting a type singleton. The new
get_sampler_instance() function will be used by the upcoming shader
cache.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Fri, 16 Jan 2015 08:53:53 +0000 (00:53 -0800)]
i965: Fix some oddities in FB_WRITE register width and execution size.
Previously, we generated this for FB writes in SIMD16 mode:
load_payload(16) vgrf5@8+0.0:F, vgrf1:F, vgrf2:F, vgrf3:F, vgrf4:F
fb_write(8) (null):UD, vgrf5@8+0.0:F 1sthalf
The LOAD_PAYLOAD's destination had its register width set to 8, and the
FB_WRITE had its execution size set to 8. This seems wrong, and while
it probably doesn't affect anything, we should fix it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Kenneth Graunke [Fri, 16 Jan 2015 09:05:21 +0000 (01:05 -0800)]
i965/fs: Make lower_load_payload etc. appear in INTEL_DEBUG=optimizer.
In order to support calling lower_load_payload() inside a condition,
this patch makes OPT() a statement expression:
https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html
We recently did the equivalent change in the vec4 backend (commit
9b8bd67768769b685c25e1276e053505aede5f93).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Neil Roberts [Wed, 14 Jan 2015 15:14:05 +0000 (15:14 +0000)]
format_utils: Use a more precise conversion when decreasing bits
When converting to a format that has fewer bits the previous code was just
shifting off the bits. This doesn't provide very accurate results. For example
when converting from 8 bits to 5 bits it is equivalent to doing this:
x * 32 / 256
This works as if it's taking a value from a range where 256 represents 1.0 and
scaling it down to a range where 32 represents 1.0. However this is not
correct because it is actually 255 and 31 that represent 1.0.
We can do better with a formula like this:
(x * 31 + 127) / 255
The +127 is to make it round correctly.
The new code has a special case to use uint64_t when the result of the
multiplication would overflow an unsigned int. This function is inline and
only ever called with constant values so hopefully the if statements will be
folded.
The main incentive to do this is to make the CPU conversion path pick the same
values as the hardware would if it did the conversion. This fixes failures
with the ‘texsubimage pbo’ test when using the patches from here:
http://lists.freedesktop.org/archives/mesa-dev/2015-January/074312.html
v2: Use 64-bit arithmetic when src_bits+dst_bits > 32
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Iago Toral Quiroga [Wed, 7 Jan 2015 09:08:57 +0000 (10:08 +0100)]
i965/gen6: Fix crash with VS+TF after rendering with GS
Rendering with a GS and then using transform feedback with a program that does
not have a GS can crash in gen6. The reason for this is that
brw_begin_transform_feedback checks brw->geometry_program to decide if there
is a GS program, but this is not correct: brw->geometry_program is updated when
issuing drawing commands, so after rendering with a GS it will be non-NULL
until we draw again with a program that does not have a GS. If the next
program uses TF, we will call glBegintransformFeedback before issuing
the drawing command and hence brw->geometry_program will be non-NULL if
the previous rendering used a GS. The right thing to do here is to check
ctx->_Shader->CurrentProgram[MESA_SHADER_GEOMETRY] instead. This is what the
gen7 code path does too.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=87694
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Jason Ekstrand [Fri, 19 Dec 2014 19:49:58 +0000 (11:49 -0800)]
nir/live_variables: Use a worklist
This is a rework of the liveness algorithm using a worklist as suggested by
Connor. Doing so reduces the number of times we walk over the instructions
because we don't have to do an entire pointless walk over the instructions
just to figure out it's time to stop. Also, the stuff after the last loop
in the funciton will only ever get visited once.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 19 Dec 2014 19:05:02 +0000 (11:05 -0800)]
nir: Add a worklist helper structure
A worklist is a common concept in optimizations. This adds a structure
that we can reuse for many different types of optimizations.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Brian Paul [Fri, 16 Jan 2015 00:38:39 +0000 (17:38 -0700)]
nir: fix incorrect argument passed to validate_src() in validate_tex_instr()
Silences a compiler warning.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Brian Paul [Thu, 15 Jan 2015 22:28:14 +0000 (15:28 -0700)]
nir: silence compiler warning from visit_src() call
v2: use proper argument
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Brian Paul [Thu, 15 Jan 2015 22:30:40 +0000 (15:30 -0700)]
mesa: move GET_CURRENT_CONTEXT() to top of _mesa_init_renderbuffer()
To fix MSVC build.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Mike Mason [Wed, 14 Jan 2015 20:12:27 +0000 (12:12 -0800)]
mesa: Fix render buffer initial internal format in GLES 3
Changes the initial internal format of a render buffer
to GL_RGBA4 in GLES 3. This fixes a failure in the following
DrawElements test:
dEQP-GLES3.functional.state_query.rbo.renderbuffer_internal_format
Reviewed-by: Chad Versace <chad.versace@intel.com>
Jason Ekstrand [Thu, 15 Jan 2015 17:31:18 +0000 (09:31 -0800)]
util/hash_set: Rework the API to know about hashing
Previously, the set API required the user to do all of the hashing of keys
as it passed them in. Since the hashing function is intrinsically tied to
the comparison function, it makes sense for the hash set to know about
it. Also, it makes for a somewhat clumsy API as the user is constantly
calling hashing functions many of which have long names. This is
especially bad when the standard call looks something like
_mesa_set_add(ht, _mesa_pointer_hash(key), key);
In the above case, there is no reason why the hash set shouldn't do the
hashing for you. We leave the option for you to do your own hashing if
it's more efficient, but it's no longer needed. Also, if you do do your
own hashing, the hash set will assert that your hash matches what it
expects out of the hashing function. This should make it harder to mess up
your hashing.
This is analygous to
94303a0750 where we did this for hash_table
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Thu, 15 Jan 2015 16:06:05 +0000 (08:06 -0800)]
util: Move main/set to util/hash_set
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Thu, 15 Jan 2015 15:58:07 +0000 (07:58 -0800)]
hash_table: Rename insert_with_hash to insert_pre_hashed
We already have search_pre_hashed. This makes the APIs match better.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Matt Turner [Mon, 12 Jan 2015 21:58:06 +0000 (13:58 -0800)]
i965: Don't consider null dst instructions as matching non-null dst.
When performing common subexpression elimination on instructions with
non-null destinations we emit a MOV to copy the result to a new
register that must have no other uses. In the case of:
cmp.g.f0.0(8) null:D, vgrf43:F, 0.
500000f
...
cmp.g.f0.0(8) vgrf113:D, vgrf43:F, 0.
500000f
we put the first instruction in the AEB and decided that we could reuse
its result when we found the second. Unfortunately, that meant that we'd
emit a MOV from the first's destination, which is null.
Don't do anything if the entry's destination is null and the
instruction's destination is non-null.
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Matt Turner [Mon, 12 Jan 2015 18:48:04 +0000 (10:48 -0800)]
i965/vec4: Make sure that imm writes are to registers in the same file.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87887
Matt Turner [Tue, 13 Jan 2015 21:35:15 +0000 (13:35 -0800)]
i965/fs: Emit MADs from (x + abs(y * z)).
Just use the abs source modifier on both of the multiplicand
arguments.
instructions in affected programs: 300 -> 296 (-1.33%)
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Matt Turner [Sat, 20 Dec 2014 05:30:16 +0000 (21:30 -0800)]
i965/fs: Emit MADs from (x + -(y * z)).
Just use the negation source modifier on one of the multiplicand
arguments.
total instructions in shared programs:
5889529 ->
5880016 (-0.16%)
instructions in affected programs: 600846 -> 591333 (-1.58%)
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Jason Ekstrand [Thu, 15 Jan 2015 03:08:32 +0000 (19:08 -0800)]
nir/algebraic: Only replace an instruction once
Without the break, it was possible that an instruction would match multiple
expressions. If this happened, you could end up trying to replace it
multiple times and get a segfault. This makes it so that, after a
successful replacement, it moves on to the next instruction.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Wed, 14 Jan 2015 23:25:52 +0000 (15:25 -0800)]
i965/nir: Do a final copy lowering pass before lowering locals to regs
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Wed, 14 Jan 2015 23:20:49 +0000 (15:20 -0800)]
nir/vars_to_ssa: Use the copy lowering from lower_var_copies
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Wed, 14 Jan 2015 23:19:49 +0000 (15:19 -0800)]
nir: Add a pass for lowering copy instructions
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Wed, 14 Jan 2015 22:00:12 +0000 (14:00 -0800)]
nir/vars_to_ssa: Refactor get_deref_node
This refactor allows you to more easily get the deref node associated with
a given variable. We then use that new functionality in the
deref_may_be_aliased function instead of creating a 1-element deref chain.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Wed, 14 Jan 2015 20:41:15 +0000 (12:41 -0800)]
nir: Rename lower_variables to lower_vars_to_ssa
The original name wasn't particularly descriptive. This one indicates that
it actually gives you SSA values as opposed to the old pass which lowered
variables to registers.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Sat, 10 Jan 2015 04:01:13 +0000 (20:01 -0800)]
nir/tex_instr: Add a nir_tex_src struct and dynamically allocate the src array
This solves a number of problems. First is the ability to change the
number of sources that a texture instruction has. Second, it solves the
delema that may occur if a texture instruction has more than 4 sources.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Wed, 7 Jan 2015 00:11:57 +0000 (16:11 -0800)]
nir/validate: Only build in debug mode
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Tue, 6 Jan 2015 21:15:40 +0000 (13:15 -0800)]
nir/lower_variables: Improve documentation
Additional description was added to a variety of places. Also, we no
longer use the term "leaf" to describe fully-qualified direct derefs.
Instead, we simply use the term "direct" or spell it out completely.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Tue, 6 Jan 2015 21:14:00 +0000 (13:14 -0800)]
nir/lower_variables: Use a for loop for get_deref_node
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Tue, 6 Jan 2015 19:13:22 +0000 (11:13 -0800)]
nir: Use the actual FNV-1a hash for hashing derefs
We also switch to using loops rather than recursion.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Tue, 6 Jan 2015 06:31:23 +0000 (22:31 -0800)]
util/hash_table: Pull the details of the FNV-1a into helpers
This way the basics of the FNV-1a hash can be reused to easily create other
hashing functions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Fri, 19 Dec 2014 23:56:55 +0000 (15:56 -0800)]
nir: Make intrinsic flags into an enum
This should be much better for debugging as GDB will pick up on the fact
that it's an enum and actually tell you what you're looking at instead of
giving you some arbitrary hex value you have to go look up.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 19 Dec 2014 23:30:15 +0000 (15:30 -0800)]
nir: Use static inlines instead of macros for list getters
This should make debugging a lot easier as GDB handles static inlines much
better than macros. Also, static inlines are typesafe.
Reviewed-By: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 19 Dec 2014 22:55:45 +0000 (14:55 -0800)]
nir/variable: Remove the constant_value field
This was a left-over relic of GLSL IR that we aren't using for anything.
If we ever want that value again, we can add it back, but NIR constant
folding should be just as good as GLSL IR's if not better pretty soon, so
I'm not worried about it.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 19 Dec 2014 01:13:22 +0000 (17:13 -0800)]
nir: Add some documentation
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Thu, 18 Dec 2014 22:42:01 +0000 (14:42 -0800)]
nir/lower_variables: Follow the Cytron paper more closely
Previously, our variable renaming algorithm, while similar to the one in
the Cytron paper, was not the same. While I'm pretty sure it was correct,
it will be easier for readers of the code in the variable renaming pass if
it follows more closely. This commit removes the automatic stack popping
we were doing and replaces it with explicit popping like Cytron does.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Thu, 18 Dec 2014 22:01:37 +0000 (14:01 -0800)]
nir/print: Various cleanups recommended by Eric
Cc: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Thu, 18 Dec 2014 21:49:43 +0000 (13:49 -0800)]
nir/lower_variables: Add a bunch of comments and re-arrange a few things
This commit seeks to make the lower_variables pass much more clear by
adding a pile of comments and re-arranging a few things. There are no
functional or algorithmic changes.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Thu, 18 Dec 2014 00:53:04 +0000 (16:53 -0800)]
nir: Rename parallel_copy_copy to parallel_copy_entry and add a foreach macro
parallel_copy_copy was a silly name. Also, things were getting long and
annoying, so I added a foreach macro. For historical reasons, several of
the original iterations over parallel copy entries in from_ssa used the
_safe variants of the loop. However, all of these no longer ever remove an
entry so it's ok to make them all use the normal iterator.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Wed, 17 Dec 2014 23:34:47 +0000 (15:34 -0800)]
nir/from_ssa: Clean up parallel copy handling and document it better
Previously, we were doing a lazy creation of the parallel copy
instructions. This is confusing, hard to get right, and involves some
extra state tracking of the copies. This commit adds an extra walk over
the basic blocks to add the block-end parallel copies up front. This
should be much less confusing and, consequently, easier to get right. This
commit also adds more comments about parallel copies to help explain what
all is going on.
As a consequence of these changes, we can now remove the at_end parameter
from nir_parallel_copy_instr.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Wed, 17 Dec 2014 22:49:24 +0000 (14:49 -0800)]
nir: Rename nir_block_following_if to nir_block_get_following_if
The new name is a little longer but less confusing.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Wed, 17 Dec 2014 20:34:27 +0000 (12:34 -0800)]
i965/fs_nir: Handle sample ID, position, and mask better
Before, we were emitting the full pile of setup instructions for sample_id
and sample_pos every time they were used. With this commit, we emit them
in their own pass once at the beginning of the shader and simply emit uses
later on. When it comes time for setting up VS, we can put setup for its
special values in the same pass.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Tue, 16 Dec 2014 22:43:26 +0000 (14:43 -0800)]
nir/opcodes: Remove the per_component info field
Originally, this field was intended for determining if the given
instruction acted per-component or if it had mismatching source and
destination sizes that would have to be interpreted specially. However, we
can easily derive this from output_size == 0, so it's not really that
useful. Also, the values we were setting in nir_opcodes.h for this field
were completely bogus and it was never used.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Tue, 16 Dec 2014 20:26:38 +0000 (12:26 -0800)]
nir/search: Use nir_op_infos to determine if an operation is commutative
Prior to this commit, we had a big switch statement for this. Now it's
baked into the opcode metadata so we can just use that.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Tue, 16 Dec 2014 20:22:01 +0000 (12:22 -0800)]
nir/opcodes: Add algebraic properties metadata
This commit adds some algebraic properties to the metadata of each opcode
in NIR. In particular, you now know, just from the metadata, if a given
opcode is commutative or associative. This will be useful for algebraic
transformation passes that want to be able to match a + b as well as b + a
in one go.
v2: Make algebraic properties all caps. This was more consistent with the
intrinsics flags and seems better for flags in general.
Also, the enums are now declared with (1 << n) rather then hex values.
v3: fmin and fmax technically aren't commutative or associative. Things
get funny when one of the arguments is a NaN.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Tue, 16 Dec 2014 01:32:56 +0000 (17:32 -0800)]
nir: Make load_const SSA-only
As it was, we weren't ever using load_const in a non-SSA way. This allows
us to substantially simplify the load_const instruction. If we ever need a
non-SSA constant load, we can do a load_const and an imov.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Tue, 16 Dec 2014 01:44:37 +0000 (17:44 -0800)]
nir: Make nir_ssa_undef_instr_create initialize the destination
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Tue, 16 Dec 2014 00:12:04 +0000 (16:12 -0800)]
i965/nir: Move the other lowering passes to before out-of-SSA
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Tue, 16 Dec 2014 01:37:07 +0000 (17:37 -0800)]
nir/lower_system_values: Handle SSA destinations
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Tue, 16 Dec 2014 00:23:21 +0000 (16:23 -0800)]
nir/lower_atomics: Use/support SSA
Previously, lower_atomics was non-SSA only. We assert-failed if the
destination of an atomic operation intrinsic was an SSA def and we used
temporary registers for computing offsets. This commit changes both of
these behaviors. We now use SSA values for computing offsets (so we can
optimize them) and we handle SSA destinations. We also move the pass to
run before we go out of SSA on i965 as it now generates SSA values.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Mon, 15 Dec 2014 23:15:01 +0000 (15:15 -0800)]
nir/live_variables: Use the new ssa_def iterator
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Tue, 16 Dec 2014 03:38:14 +0000 (19:38 -0800)]
nir: Use nir_foreach_ssa_def for setting up ssa destinations
Before, we were using foreach_dest and switching on whether the destination
was an SSA value. This works, except not all destinations are SSA values
so we have to special-case ssa_undef instructions. Now that we have a
foreach_ssa_def function, we can iterate over all of the register
destinations in one pass and iterate over the SSA destinations in a second.
This way, if we add other ssa-only instructions, we won't have to worry
about adding them to the special case we have for ssa_undef.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Mon, 15 Dec 2014 23:12:59 +0000 (15:12 -0800)]
nir: Add a foreach_ssa_def function
There are some functions whose destinations are SSA-only and so aren't a
nir_dest. This provides a function that is capable of iterating over the
SSA definitions defined by those functions. If you want registers, you
should use the old iterator.
v2: Kenneth Graunke <kenneth@whitecape.org>:
- Fix nir_foreach_ssa_def's return value.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Mon, 15 Dec 2014 22:06:58 +0000 (14:06 -0800)]
nir/lower_variables: Use a real dominance DFS for variable renaming
Previously, we were just iterating over the program "in order" which
kind-of approximates a DFS, but not really. In particular, we got the
following case wrong:
loop {
a = 3;
if (foo) {
a = 5;
} else {
break;
}
use(a);
}
where use(a) would get 3 instead of 5 because of premature popping of the
SSA def stack. Now, since we do an actaul DFS, we should evaluate use(a)
immediately after a = 5 and we should be ok.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Sat, 13 Dec 2014 06:38:41 +0000 (22:38 -0800)]
nir: Remove predication
We stopped generating predicates in glsl_to_nir some time ago. Right now,
it's all dead untested code that I'm not convinced always worked in the
first place. If we decide we want them back, we can revert this patch.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Sat, 13 Dec 2014 04:37:04 +0000 (20:37 -0800)]
nir: Make bcsel a fully vector operation
Previously, the condition was a scalar that applied to all components
simultaneously. As of this commit, the condition is a vector and each
component is switched seperately.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Sat, 13 Dec 2014 00:25:38 +0000 (16:25 -0800)]
nir: Call nir_metadata_preserve more places
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Sat, 13 Dec 2014 00:22:46 +0000 (16:22 -0800)]
nir/metadata: Rename metadata_dirty to metadata_preserve
nir_metadata_dirty was a terrible name because the parameter it takes is
the metadata to be preserved. This is really confusing because it looks
like it's doing the opposite of what it is actually doing. Now it's named
sensibly.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Sat, 6 Dec 2014 00:43:56 +0000 (16:43 -0800)]
i965/fs_nir: Add support for indirect texture arrays
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Use the nir_tex_src_sampler_offset source type instead of the
sampler_indirect thing that I cooked up before.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Jason Ekstrand [Sat, 6 Dec 2014 00:09:53 +0000 (16:09 -0800)]
nir: Rework the way samplers are lowered
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Use the nir_tex_src_sampler_offset source type instead of the
sampler_indirect thing that I cooked up before.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Jason Ekstrand [Thu, 8 Jan 2015 01:52:37 +0000 (17:52 -0800)]
nir/tex_instr_create: Initialize all 4 sources
This helps a lot with things like lowering passes that may need to add
sources.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 5 Dec 2014 22:46:24 +0000 (14:46 -0800)]
nir/tex_instr: Rename the indirect source type and add an array size
In particular, we rename nir_tex_src_sampler_index to _sampler_offset and
add a sampler_array_size field to nir_tex_instr. This way we can pass the
size of sampler arrays through to backends even after removing the variable
information and, with it, the type.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Tue, 9 Dec 2014 01:34:52 +0000 (17:34 -0800)]
nir: Use a source for uniform buffer indices instead of an index
In GLSL-to-NIR we were just setting the base index to 0 whenever there was
an indirect so having it expressed as a sum makes no sense. Also, while a
base offset may make sense for the memory location (first element in the
array, etc.) it makes less sense for the actual uniform buffer index. This
may change later, but it seems to make more sense for now.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 5 Dec 2014 20:05:55 +0000 (12:05 -0800)]
nir: Constant fold array indirects
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 5 Dec 2014 19:03:06 +0000 (11:03 -0800)]
nir: Make texture instruction names more consistent
This commit renames nir_instr_as_texture to nir_instr_as_tex and renames
nir_instr_type_texture to nir_instr_type_tex to be consistent with
nir_tex_instr.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Sat, 15 Nov 2014 06:09:27 +0000 (22:09 -0800)]
nir: Remove the ffma peephole
This is no longer needed because it's now part of the algebraic
optimization pass
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Sat, 15 Nov 2014 05:35:25 +0000 (21:35 -0800)]
nir: Add a basic constant folding pass
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 12 Dec 2014 19:13:10 +0000 (11:13 -0800)]
nir: Add an algebraic optimization pass
This pass uses the previously built algebraic transformations framework and
should act as an example for anyone else wanting to make an algebraic
transformation pass for NIR.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Sat, 15 Nov 2014 01:47:56 +0000 (17:47 -0800)]
nir: Add infastructure for generating algebraic transformation passes
This commit builds on the nir_search.h infastructure by adding a bit of
python code that makes it stupid easy to write an algebraic transformation
pass. The nir_algebraic.py file contains four python classes that
correspond directly to the datastructures in nir_search.c and allow you to
easily generate the C code to represent them. Given a list of
search-and-replace operations, it can then generate a function that applies
those transformations to a shader.
The transformations can be specified manually, or they can be specified
using nested tuples. The nested tuples make a neat little language for
specifying expression trees and search-and-replace operations in a very
readable and easy-to-edit fasion.
The generated code is also fairly efficient. Insteady of blindly calling
nir_replace_instr with every single transformation and on every single
instruction, it uses a switch statement on the instruction opcode to do a
first-order culling and only calls nir_replace_instr if the opcode is known
to match the first opcode in the search expression.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 14 Nov 2014 05:19:28 +0000 (21:19 -0800)]
nir: Add an expression matching framework
This framework provides a simple way to do simple search-and-replace
operations on NIR code. The nir_search.h header provides four simple data
structures for representing expressions: nir_value and four subtypes:
nir_variable, nir_constant, and nir_expression. An expression tree can
then be represented by nesting these data structures as needed. The
nir_replace_instr function takes an instruction, an expression, and a
value; if the instruction matches the expression, it is replaced with a new
chain of instructions to generate the given replacement value. The
framework keeps track of swizzles on sources and automatically generates
the currect swizzles for the replacement value.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 14 Nov 2014 01:23:58 +0000 (17:23 -0800)]
nir/glsl: Emit abs, neg, and sat operations instead of source modifiers
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 5 Dec 2014 19:00:05 +0000 (11:00 -0800)]
nir: Make the type casting operations static inline functions
Previously, the casting operations were macros. While this is usually
fine, the casting macro used the input parameter twice leading to strange
behavior when you passed the result of another function into it. Since we
know the source and destination types explicitly, we don't loose anything
by making it a function.
Also, this gives us a nice little macro for creating cast function that
will hopefully prevent mistyping.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Thu, 13 Nov 2014 03:18:05 +0000 (19:18 -0800)]
nir: Add a lowering pass for adding source modifiers where possible
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>