Anuj Phogat [Thu, 20 Sep 2012 20:17:19 +0000 (13:17 -0700)]
meta: Add on demand compilation of per target shader programs
A call to glGenerateMipmap() follows the generation of a relevant
shader program in setup_glsl_generate_mipmap().
To support all texture targets and to avoid compiling shaders
everytime, per target shader programs are compiled on demand
and saved for the next call.
Fixes float-texture(mipmap.manual):
See Comment 6: https://bugs.freedesktop.org/show_bug.cgi?id=54296
NOTE: This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tom Stellard [Mon, 17 Sep 2012 14:31:31 +0000 (14:31 +0000)]
clover: Initialize height and depth to 1 for transfers
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tom Stellard [Thu, 13 Sep 2012 14:53:32 +0000 (14:53 +0000)]
pipe-loader: Remove a few debug_printfs
On debug builds these were always being printed.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tom Stellard [Thu, 13 Sep 2012 15:21:42 +0000 (15:21 +0000)]
radeon/llvm: Handle loads from the constants address space.
Reading from constant memory is not supported yet, so constant reads use
global memory.
Tom Stellard [Thu, 13 Sep 2012 15:20:46 +0000 (15:20 +0000)]
radeon/llvm: Add support for v4f32 stores on R600
Tom Stellard [Thu, 13 Sep 2012 15:19:48 +0000 (15:19 +0000)]
radeon/llvm: Add support for i8 reads on R600
Tom Stellard [Thu, 13 Sep 2012 15:14:26 +0000 (15:14 +0000)]
radeon/llvm: Expand vector fadd and fmul on R600
Tom Stellard [Thu, 13 Sep 2012 15:08:40 +0000 (15:08 +0000)]
radeon/llvm: Add optimization for FP_ROUND
Tom Stellard [Thu, 13 Sep 2012 15:04:15 +0000 (15:04 +0000)]
radeon/llvm: Replace AMDGPU pow intrinsic with the llvm version
Paul Berry [Thu, 13 Sep 2012 03:51:07 +0000 (20:51 -0700)]
i965/blorp: Fix narrowing warnings.
Blorp has to convert rectangle coordinates from integers to floats in
order to send them down the GPU pipeline. Recent versions of GCC
issue a warning for this, since a float is not capable of precisely
representing all possible 32-bit integer values. Suppress the warning
with an explicit type cast in the case of blorp, since rectangle
coordinates will never be large enough to cause a loss of precision.
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Thu, 20 Sep 2012 23:31:15 +0000 (16:31 -0700)]
i965: Remove brw_set_predicate_inverse(p, true) from scratch offset code
Given that it exists between a push/pop of instruction state, this call
can only affect the MOV or ADD instruction generated just below it.
Neither of those instructions are predicated, so it makes no sense to
ask for the inverse predicate.
This fixes grumblings from the simulator debugger, which was
complaining about an invalid predicate.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Wed, 19 Sep 2012 19:01:14 +0000 (12:01 -0700)]
mesa: Don't override S3TC internalFormat if data is pre-compressed.
Commit
42723d88d intended to override an S3TC internalFormat to a
generic compressed format when the application requested online
compression of uncompressed data. Unfortunately, it also broke
pre-compressed textures when libtxc_dxtn isn't installed but the
extensions are forced on.
Both glCompressedTexImage2D() and glTexImage2D() call teximage(), which
calls _mesa_choose_texture_format(), hitting this override code. If we
have actual S3TC source data, we can't treat it as any other format, and
need to avoid the override.
Since glCompressedTexImage2D() passes in a format of GL_NONE (which is
illegal for glTexImage), we can use that to detect the pre-compressed
case and avoid the overrides.
Fixes a regression since
42723d88d370a7599398cc1c2349aeb951ba1c57.
NOTE: This is a candidate for the 9.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-and-tested-by: Jordan Justen <jordan.l.justen@intel.com>
Kenneth Graunke [Tue, 11 Sep 2012 23:20:43 +0000 (16:20 -0700)]
i965/blorp: Add support for blits between SRGB and linear formats.
Fixes colorspace issues in L4D2 when multisampling is enabled (the
scene was far too dark, but the flashlight area was way too bright).
The nVidia and AMD binary drivers both allow this kind of blit.
NOTE: This is a candidate for the 9.0 branch.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Kenneth Graunke [Tue, 4 Sep 2012 18:29:30 +0000 (11:29 -0700)]
mesa: Ignore SRGB when determining compatible resolve formats.
MSAA resolves and other blit-like operations ignore SRGB state anyway,
so we should be able to safely allow resolves between compatible
SRGB/linear formats like SRGBA8 and RGBA8888.
This matches the behavior of the nVidia and AMD binary drivers.
Fixes completely black rendering when using multisampling in L4D2.
NOTE: This is a candidate for the 9.0 branch.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Andreas Boll [Thu, 20 Sep 2012 14:23:15 +0000 (16:23 +0200)]
docs: update some more FAQs
v2: remove mention of XFree86
Reviewed-by: Brian Paul <brianp@vmware.com>
Andreas Boll [Thu, 20 Sep 2012 14:01:37 +0000 (16:01 +0200)]
docs: remove utility.html
This page is very old and some of the links are dead.
Reviewed-by: Brian Paul <brianp@vmware.com>
Andreas Boll [Thu, 20 Sep 2012 14:01:35 +0000 (16:01 +0200)]
docs: remove science.html
This page is very old and some of the links are dead.
Reviewed-by: Brian Paul <brianp@vmware.com>
Andreas Boll [Thu, 20 Sep 2012 14:01:32 +0000 (16:01 +0200)]
docs: remove modelers.html
This page is very old and some of the links are dead.
Reviewed-by: Brian Paul <brianp@vmware.com>
Andreas Boll [Thu, 20 Sep 2012 14:01:24 +0000 (16:01 +0200)]
docs: remove libraries.html
This page is very old and some of the links are dead.
Reviewed-by: Brian Paul <brianp@vmware.com>
Andreas Boll [Thu, 20 Sep 2012 14:01:21 +0000 (16:01 +0200)]
docs: remove games.html
This page is very old and some of the links are dead.
Reviewed-by: Brian Paul <brianp@vmware.com>
Andreas Boll [Thu, 20 Sep 2012 14:01:18 +0000 (16:01 +0200)]
docs/contents: add autoconf.html link
make it easier to find the docs/autoconf.html site
Reviewed-by: Brian Paul <brianp@vmware.com>
Andreas Boll [Thu, 20 Sep 2012 14:01:15 +0000 (16:01 +0200)]
docs: convert last traces of progs to mesa/demos repository
v2: fix typo
Reviewed-by: Brian Paul <brianp@vmware.com>
Andreas Boll [Thu, 20 Sep 2012 14:01:12 +0000 (16:01 +0200)]
docs: add IRC info
Reviewed-by: Brian Paul <brianp@vmware.com>
Andreas Boll [Thu, 20 Sep 2012 14:01:08 +0000 (16:01 +0200)]
docs/egl: improve markup
replace unordered list <ul> with defined list <dl>
Reviewed-by: Brian Paul <brianp@vmware.com>
Andreas Boll [Thu, 20 Sep 2012 14:01:03 +0000 (16:01 +0200)]
docs/autoconf: improve markup
replace unordered list <ul> with defined list <dl>
Reviewed-by: Brian Paul <brianp@vmware.com>
Andreas Boll [Thu, 20 Sep 2012 14:00:52 +0000 (16:00 +0200)]
docs/autoconf: remove obsolete demo options
removed with commit
56c3cce2a199f7f79a48d7633431e1e80fcd4ba2
two years ago
Reviewed-by: Brian Paul <brianp@vmware.com>
Andreas Boll [Thu, 20 Sep 2012 13:22:37 +0000 (15:22 +0200)]
docs: improve quality of gears.png
Reviewed-by: Brian Paul <brianp@vmware.com>
Brian Paul [Wed, 19 Sep 2012 18:43:38 +0000 (12:43 -0600)]
gallium: mention PIPE_TIMEOUT_INFINITE in the fence_finish() comment
Brian Paul [Thu, 20 Sep 2012 15:13:37 +0000 (09:13 -0600)]
llvmpipe: fix overflow bug in total texture size computation
v2: use uint64_t for the total_size variable, per Jose.
Also add two earlier checks for exceeding the max texture size.
For example a 1K^3 RGBA volume would overflow the lpr->image_stride
variable.
Use simple algebra to avoid overflow in intermediate values.
So instead of "x * y > z" use "x > z / y".
This should work if we happen to be on a platform that doesn't have
64-bit types.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Alex Deucher [Thu, 20 Sep 2012 15:16:36 +0000 (11:16 -0400)]
r600g/llvm: rs780/rs880 are r600 asics
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ian Romanick [Tue, 18 Sep 2012 13:19:18 +0000 (15:19 +0200)]
mesa: Allow glGetTexParameter of GL_TEXTURE_SRGB_DECODE_EXT
This was already (correctly) supported for glGetSamplerParameter paths.
NOTE: This is a candidate for stable branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tom Stellard [Thu, 6 Sep 2012 04:20:27 +0000 (00:20 -0400)]
r300/compiler: Use precomputed q values in the register allocator
Tom Stellard [Thu, 6 Sep 2012 04:20:27 +0000 (00:20 -0400)]
r300g: Init regalloc state during context creation
Initializing the regalloc state is expensive, and since it is always
the same for every compile we only need to initialize it once per
context. This should help improve shader compile times for the driver.
Tom Stellard [Mon, 3 Sep 2012 12:25:13 +0000 (08:25 -0400)]
r300/compiler: Don't create register classes for inputs
Tom Stellard [Mon, 3 Sep 2012 14:43:45 +0000 (10:43 -0400)]
ra: Add q_values parameter to ra_set_finalize()
This allows the user to pass precomputed q values to the allocator.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tom Stellard [Mon, 3 Sep 2012 12:23:02 +0000 (08:23 -0400)]
ra: Clarify usage of ra_set_node_reg()
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tom Stellard [Wed, 1 Aug 2012 20:42:53 +0000 (20:42 +0000)]
r600g: Invalidate texture cache when creating vertex buffers for compute v2
Compute shaders fetch data from vertex buffers via the texture cache, so
we need to make sure the texture cache is flushed.
v2:
- Fix rebase mistake
- Fix spelling in comment
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Tom Stellard [Mon, 17 Sep 2012 14:33:56 +0000 (14:33 +0000)]
r600g: Use LOOP_START_DX10 for loops
LOOP_START_DX10 ignores the LOOP_CONFIG* registers, so it is not limited
to 4096 iterations like the other LOOP_* instructions. Compute shaders
need to use this instruction, and since we aren't optimizing loops with
the LOOP_CONFIG* registers for pixel and vertex shaders, it seems like
we should just use it for everything.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Tom Stellard [Thu, 13 Sep 2012 17:15:57 +0000 (17:15 +0000)]
r600g: Set the correct value of COLOR*_DIM for RATs
For buffers (which is what is being used for RATs), the
COLOR*_DIM.WIDTH_MASK field needs to be set to the low 16-bits of the
buffer size, and the COLOR*_DIM.HEIEGHT_MAX needs to be set to the
high bits.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Tom Stellard [Thu, 13 Sep 2012 17:14:56 +0000 (17:14 +0000)]
r600g: Make sure to initialize DB_DEPTH_CONTROL register for compute
The kernel CS checker will fail if this register is not initialized.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Tom Stellard [Thu, 13 Sep 2012 14:37:53 +0000 (14:37 +0000)]
r600g: Add some comments and debug printfs to compute code
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Tom Stellard [Wed, 19 Sep 2012 19:27:32 +0000 (15:27 -0400)]
r600g: Add missing break to case statement
Michal Sciubidlo [Wed, 12 Sep 2012 06:57:01 +0000 (08:57 +0200)]
radeon/llvm: Emit ISA for ALU instructions in the R600 code emitter
Signed-off-by: Tom Stellard <thomas.stellard@amd.com>
Tom Stellard [Wed, 19 Sep 2012 16:23:20 +0000 (12:23 -0400)]
radeon/llvm: Only support 512 constant registers on R600
This is necessary upcoming encoding changes, since we will only be
using 9-bits for register encoding.
Andreas Boll [Wed, 19 Sep 2012 16:22:37 +0000 (18:22 +0200)]
docs: update faq
Andreas Boll [Wed, 19 Sep 2012 16:22:31 +0000 (18:22 +0200)]
docs: update sourcetree
- add OpenCL state tracker Clover
- add XvMC state tracker
- remove progs
directory got moved into its own repository mesa/demos
- remove vf
directory removed with
abda64efce73c18d49c74
Andreas Boll [Wed, 19 Sep 2012 16:22:19 +0000 (18:22 +0200)]
docs: remove obsolete r300c traces
Brian Paul [Wed, 19 Sep 2012 16:07:45 +0000 (10:07 -0600)]
Revert "mesa: consolidate subtexture x/y/width/height error checking code"
This reverts commit
5b807400a87d5efefc481017eb420b772933e1da.
accidentally pushed.
Brian Paul [Wed, 19 Sep 2012 16:07:34 +0000 (10:07 -0600)]
Revert "more comment"
This reverts commit
5205db6a7ce623a7fca72e6dc6391bd12be3f6aa.
accidentally pushed
Brian Paul [Wed, 19 Sep 2012 16:07:22 +0000 (10:07 -0600)]
Revert "mesa: clean-up and fix glCompressedTexSubImage error checking"
This reverts commit
0c67fe5d2dc6d8066fc23c39184d9614abf63992.
accidentally pushed.
Brian Paul [Wed, 19 Sep 2012 16:01:04 +0000 (10:01 -0600)]
docs: fix "Cppyright" typo
Brian Paul [Tue, 18 Sep 2012 21:51:33 +0000 (15:51 -0600)]
mesa: clean-up and fix glCompressedTexSubImage error checking
Brian Paul [Tue, 18 Sep 2012 21:42:06 +0000 (15:42 -0600)]
more comment
Brian Paul [Tue, 18 Sep 2012 21:22:41 +0000 (15:22 -0600)]
mesa: consolidate subtexture x/y/width/height error checking code
This is the code that checks if a subtexure region is aligned to the
compressed format's block size.
Andreas Boll [Tue, 18 Sep 2012 17:31:28 +0000 (19:31 +0200)]
docs: remove obsolete target attribute
Andreas Boll [Tue, 18 Sep 2012 16:59:33 +0000 (18:59 +0200)]
docs: news.html is the new index.html
Andreas Boll [Tue, 18 Sep 2012 16:57:54 +0000 (18:57 +0200)]
docs: remove obsolete frame layout
Andreas Boll [Tue, 18 Sep 2012 16:57:02 +0000 (18:57 +0200)]
docs: add new iframe layout
Andreas Boll [Wed, 19 Sep 2012 15:15:45 +0000 (17:15 +0200)]
docs/news: linkify some active links
Andreas Boll [Wed, 19 Sep 2012 15:15:39 +0000 (17:15 +0200)]
docs/news: deactivate dead links
I have left the links as <code> elements for the purpose of
documentation.
Andreas Boll [Wed, 19 Sep 2012 15:15:34 +0000 (17:15 +0200)]
docs/news: drop redundant link
Andreas Boll [Wed, 19 Sep 2012 15:15:31 +0000 (17:15 +0200)]
docs/news: update link
Andreas Boll [Wed, 19 Sep 2012 15:15:24 +0000 (17:15 +0200)]
docs/news: remove link to a non-existent page
Andreas Boll [Sat, 1 Sep 2012 09:18:19 +0000 (11:18 +0200)]
docs: fix some issues in relnotes
improve markup
fix link to relnotes-9.0
add missing relnotes links
Andreas Boll [Wed, 19 Sep 2012 10:10:32 +0000 (12:10 +0200)]
docs/devinfo: fix typo
Vadim Girlin [Wed, 19 Sep 2012 00:48:16 +0000 (04:48 +0400)]
winsys/radeon: fix relocs caching
Don't cache pointers to elements of reallocatable array.
In some circumstances it caused false cache hits resulting in incorrect
command stream and gpu lockup.
Note: This is a candidate for the stable branches.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Vincent Lejeune [Mon, 17 Sep 2012 20:20:18 +0000 (22:20 +0200)]
radeon/llvm: Add a fdiv pattern.
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
Vincent Lejeune [Tue, 11 Sep 2012 15:56:39 +0000 (17:56 +0200)]
radeon/llvm: reserve also corresponding 128bits reg
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
Andreas Boll [Tue, 18 Sep 2012 12:39:17 +0000 (14:39 +0200)]
docs: drop obsolete sourceforge link
Signed-off-by: Brian Paul <brianp@vmware.com>
Brian Paul [Mon, 17 Sep 2012 01:44:07 +0000 (19:44 -0600)]
softpipe: implement the new can_create_resource() function
And define a SP_MAX_TEXTURE_SIZE value as we do in llvmpipe.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Mon, 17 Sep 2012 01:43:50 +0000 (19:43 -0600)]
llvmpipe: implement the new can_create_resource() function
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Mon, 17 Sep 2012 01:42:15 +0000 (19:42 -0600)]
st/mesa: implement new proxy texture code
If the gallium driver implements the can_create_resource() function, call
it to do proxy texture size checks.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Mon, 17 Sep 2012 01:40:13 +0000 (19:40 -0600)]
gallium: add new pipe_screen::can_create_resource() function
Used to implement proxy textures. If a gallium driver doesn't implement
this function we'll just continue to use the core Mesa fallback code.
Without this hook we really have no good way to implement OpenGL proxy
textures with gallium drivers.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Mon, 17 Sep 2012 01:15:28 +0000 (19:15 -0600)]
mesa: take cube faces into account in _mesa_test_proxy_teximage()
There will always be six cube faces so take that into consideration when
computing the texture size and comparing against the limit.
Brian Paul [Mon, 17 Sep 2012 01:14:56 +0000 (19:14 -0600)]
mesa: handle GL_PROXY_TEXTURE_CUBE_MAP in _mesa_num_tex_faces()
Brian Paul [Mon, 17 Sep 2012 01:05:51 +0000 (19:05 -0600)]
llvmpipe: set max cube texture size to 4K x 4K
Before, the limit was 8K. For 32-bit RGBA that would be require 1.5 GB
of memory (w/out mipmaps). That's well beyond the LP_MAX_TEXTURE_SIZE
of 1GB.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Sat, 15 Sep 2012 16:30:20 +0000 (10:30 -0600)]
mesa: move/fix levels check for glTexStorage()
Fix copy&paste error and move min levels check closer to max levels check.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Sat, 15 Sep 2012 16:30:20 +0000 (10:30 -0600)]
mesa: rewrite glTexStorage() code
Simplify the code and make it more like the other glTexImage commands.
Call _mesa_legal_texture_dimensions() to validate width, height, depth.
Call ctx->Driver.TestProxyTexImage() to make sure texture is not too large.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Sat, 15 Sep 2012 16:30:20 +0000 (10:30 -0600)]
mesa: rework texture size error checking
There are two aspects to texture image size checking:
1. Are the width, height, depth legal values (not negative, not larger
than the max size for the mipmap level, etc)?
2. Is the texture just too large to handle? For example, we might not be
able to really allocate memory for a 3D texture of maxSize x maxSize x
maxSize.
Previously, we did (1) via the ctx->Driver.TestProxyTextureImage() hook
but those tests are really device-independent. Now we do (2) via that
hook since the max texture memory and texture shape are device-dependent.
Also, (1) is now done outside the general texture parameter error checking
functions because of the special interaction with proxy textures. The
recently introduced PROXY_ERROR token is removed.
The teximage() and copyteximage() functions are bit simpler now (less
if-then nesting, etc.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Sat, 15 Sep 2012 16:30:20 +0000 (10:30 -0600)]
mesa: refactor _mesa_test_proxy_teximage() code
Basically, move the body into a new _mesa_legal_texture_dimensions() function.
More refactoring to come.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Sat, 15 Sep 2012 16:30:20 +0000 (10:30 -0600)]
mesa: move glTexImage 'level' error checking
Move level checking out of _mesa_test_proxy_teximage() and into
the other error-checking functions.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Tue, 18 Sep 2012 01:46:17 +0000 (19:46 -0600)]
mesa: change create_version_string() return type to void
Fixes "warning: no return statement in function returning non-void"
Dave Airlie [Sat, 15 Sep 2012 03:26:39 +0000 (13:26 +1000)]
glsl: make _mesa_builtin_uniform_desc static
I can't see any reason this is global (unless for debugging)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Tom Stellard [Tue, 28 Aug 2012 19:25:35 +0000 (15:25 -0400)]
radeon/llvm: Inital flow control support for SI
This adds basic flow control support for If-Then-Else blocks using
predicates (stored in the EXEC register) and a predicate stack for
nested flow control.
Xinya Zhang [Mon, 17 Sep 2012 08:35:06 +0000 (16:35 +0800)]
r600g: Close a memory leak of llvm byte streams
No regressions found in the tests of opencl-example/run_tests.sh.
Signed-off-by: Xinya Zhang <zxy_thf@hotmail.com>
Signed-off-by: Tom Stellard <thomas.stellard@amd.com>
Tom Stellard [Mon, 17 Sep 2012 19:08:00 +0000 (19:08 +0000)]
radeon/llvm: Fix unused variable warning
Tom Stellard [Mon, 17 Sep 2012 19:06:25 +0000 (19:06 +0000)]
radeon/llvm: Move kernel arg lowering into R600TargetLowering class
Jordan Justen [Tue, 4 Sep 2012 18:16:28 +0000 (11:16 -0700)]
main/version: consolodate version string creation for ES/Desktop GL
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Eric Anholt [Fri, 31 Aug 2012 18:41:22 +0000 (11:41 -0700)]
i965: Stop putting 8 NOPs after each prorgam.
As far as I can see, the intention of the requirement that we do so is to
prevent instruction prefetch from wandering out into either unmapped memory or
memory with a different caching type, and hanging the chip. The kernel makes
sure that the page after your BO has a valid page of the same caching type,
which meets this requirement, so there's no need to waste space between our
programs (and in instruction cache) on this.
Saves another 9kb instructions in l4d2 shaders.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Wed, 5 Sep 2012 14:42:36 +0000 (07:42 -0700)]
i965: Test instruction compaction on gen7
Kenneth Graunke [Sat, 11 Feb 2012 00:32:56 +0000 (16:32 -0800)]
i965: Add support for instruction compaction on Gen7.
Reduces l4d2 program size from 1195kb to 919kb. Improves performance by 0.22%
+/- 0.11% (n=70).
v2: Rebase on compaction v2, fix up flag reg handling (by anholt).
v3: Fix uncompaction of the flag register number.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Fri, 3 Feb 2012 13:17:11 +0000 (14:17 +0100)]
i965: Support instruction compaction between control flow.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Eric Anholt [Wed, 1 Feb 2012 00:55:20 +0000 (16:55 -0800)]
i965: Add support for instruction compaction.
This reduces program size by using some smaller encodings for common bit
patterns in the Gen ISA, with the hope of making programs fit in the
instruction cache better.
v2: Use larger bitshifts for the uncompressed field setups, in line with the
way it's described in the spec. Consistently name a brw_compile "p" like
all other code. Add a couple more tests. Consistently call things
"compacted" not "compressed" (which is a different feature). Drop the
explicit check for not compacting SENDs, which is unjustified and already
implied by our lack of support for immediate values.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Eric Anholt [Fri, 3 Feb 2012 11:05:05 +0000 (12:05 +0100)]
i965: Prepare the break/cont uip/jip setting for compacted instructions.
The first cut at instruction compaction won't compact things that
would change control flow jump distances, but we do need to still be
able to walk the instruction stream, which involves jumping by 8 or 16
bytes between instructions.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Eric Anholt [Fri, 3 Feb 2012 10:50:42 +0000 (11:50 +0100)]
i965: Move program dump to a helper function in brw_eu.c.
It's going to get more complicated when we do instruction compaction. This
also introduces putting the program offset in the output.
v2: Use next_insn_offset in brw_get_program(), too.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Eric Anholt [Thu, 2 Feb 2012 12:56:52 +0000 (13:56 +0100)]
i965: Make a linkable library for the contents of i965_dri.so.
To do unit testing of i965, we want to be able to link against the
driver's symbols and prod them. If we don't have a separate lib from
our loadable module, libtool gets super whiny.
Acked-by: Paul Berry <stereotype441@gmail.com>
Eric Anholt [Thu, 2 Feb 2012 13:11:08 +0000 (14:11 +0100)]
dri: Reuse dri_test.c for stub glapi symbols for unit testing.
This file is used to provide stubs for the link test in gallium dri drivers.
But the same stubs without the main can be used for making unit tests for code
in a dri driver.
Acked-by: Paul Berry <stereotype441@gmail.com>
Eric Anholt [Thu, 30 Aug 2012 23:22:52 +0000 (16:22 -0700)]
i965: Clear brw_compile on setup.
I noticed in valgrind that p->single_program_flow was used while
uninitialized. Everything else zeroed out brw_compile, but this is better
API.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Andreas Boll [Mon, 17 Sep 2012 15:29:34 +0000 (17:29 +0200)]
docs: remove obsolete mesa subset documentation
Reviewed-by: Brian Paul <brianp@vmware.com>
Michel Dänzer [Thu, 13 Sep 2012 14:48:49 +0000 (16:48 +0200)]
radeon/llvm: Match integer add/sub for SI.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>