Roland Scheidegger [Thu, 20 Feb 2014 02:09:17 +0000 (03:09 +0100)]
gallivm: add smallfloat to float conversion not relying on cpu denorm handling
The previous code relied on cpu denorm support for converting small float
formats (such r11g11b10_float and r16_float) to floats, otherwise denorms
are flushed to zero. We worked around that in llvmpipe blend code by
reenabling denorms, but this did nothing for texture sampling. Now it would
be possible to reenable it there too but I'm not really a fan of messing
with fpu flags (and it seems we can't actually do it reliably with llvm in
any case looking at some bug reports). (Not to mention if you actually have
a lot of denorms in there, you can expect some order-of-magnitude slowdown
with x86 cpus.)
So instead use code which adjusts exponents etc. directly hence not relying
on cpu denorm support for the rescaling mul.
(We still need the fpu flag handling as we can't do float-to-smallfloat
without using cpu denorms at least for now - I actually wanted to keep
both the old and new code and using one or the other depending on from where
it's called but that didn't work out as the parameter would have to be passed
through too many layers than I'd like.)
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Si Chen <sichen@vmware.com>
Leo Liu [Wed, 19 Feb 2014 17:17:51 +0000 (12:17 -0500)]
st/omx/enc: add multi scaling buffers for performance improvement
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Christian König [Wed, 19 Feb 2014 17:49:17 +0000 (18:49 +0100)]
st/omx/dec/h264: fix prevFrameNumOffset handling
Signed-off-by: Christian König <christian.koenig@amd.com>
Kenneth Graunke [Mon, 10 Feb 2014 19:42:47 +0000 (11:42 -0800)]
i965: Actually claim to support MSAA on Broadwell.
We need to advertise 8x, 4x, and 2x multisamples. Previously, we only
claimed to support 0/1 samples.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Kenneth Graunke [Tue, 11 Feb 2014 00:48:14 +0000 (16:48 -0800)]
i965: Update physical width/height munging for 2x IMS MSAA.
I can't find any documentation to explain what ought to be done here, so
I simply guessed based on the pattern I observed in the 4x/8x cases.
It appears to work, but it could be totally wrong.
I was able to find the Sandybridge PRM quote from the comments in the
latest documentation: Shared Functions > 3D Sampler > Multisampled
Surface Behavior. However, it only mentions 4x MSAA - not even 8x.
After a substantial amount more digging, I was able to find a second
page (incorrectly tagged) which confirmed the formulas in our code for
8x MSAA. However, that page didn't mention 2x MSAA at all.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Kenneth Graunke [Tue, 11 Feb 2014 03:37:08 +0000 (19:37 -0800)]
i965: Enable smooth points when multisampling without point sprites.
According to the "Point Multisample Rasterization" of the OpenGL
specification (3.0 or later), smooth points are supposed to be enabled
implicitly when multisampling, regardless of the GL_POINT_SMOOTH flag.
However, if GL_POINT_SPRITE is enabled, you get square points no matter
what. Core contexts always enable point sprites, so this effectively
makes smooth points go away, even in the case of multisampling.
Fixes Piglit's EXT_framebuffer_multisample/point-smooth tests.
(Yes, that's right folks, we actually have Piglit tests for this.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Kenneth Graunke [Tue, 11 Feb 2014 02:17:10 +0000 (18:17 -0800)]
i965: Thwack multisample enable bit in 3DSTATE_RASTER.
The meaning and effects of this bit are surprisingly complicated.
See Rasterization > Windower > Multisampling > Multisample ModesState.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Kenneth Graunke [Tue, 11 Feb 2014 01:40:24 +0000 (17:40 -0800)]
i965: Only use the SIMD16 program for per-sample shading on Broadwell.
This restriction carries forward from earlier platforms. The code is
ported straight from gen7_wm_state.c.
v2: Actually do it right.
v3: Add missing _NEW_MULTISAMPLE bit (caught by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Kenneth Graunke [Tue, 11 Feb 2014 01:31:00 +0000 (17:31 -0800)]
i965: Set "Position XY Offset Select" bits in 3DSTATE_PS on Broadwell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Kenneth Graunke [Tue, 21 Jan 2014 07:06:30 +0000 (23:06 -0800)]
i965: Add missing sample shading bits to Gen8's 3DSTATE_PS_EXTRA.
v2: Also set the "oMask Present to Render Target" bit, which is required
for shaders that write oMask. Otherwise the hardware won't expect
the extra data.
v3: Add missing _NEW_MULTISAMPLE (caught by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Kenneth Graunke [Mon, 10 Feb 2014 23:46:56 +0000 (15:46 -0800)]
i965/fs: Implement FS_OPCODE_SET_OMASK on Broadwell.
I made a few changes which I think simplify the code a bit compared to
the Gen7 implementation, but which are largely pointless.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Kenneth Graunke [Mon, 10 Feb 2014 23:09:22 +0000 (15:09 -0800)]
i965/fs: Implement FS_OPCODE_SET_SAMPLE_ID on Broadwell.
Largely cut and paste from Gen7; it works the same way.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Kenneth Graunke [Mon, 10 Feb 2014 20:02:14 +0000 (12:02 -0800)]
i965: Disable MCS on Broadwell for now.
v2: Add a perf_debug() message to remind us to come back to this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Kenneth Graunke [Mon, 10 Feb 2014 19:18:41 +0000 (11:18 -0800)]
i965: Use gen7_surface_msaa_bits in Broadwell SURFACE_STATE code.
We already set the number of samples, but were missing the MSAA layout
mode. Reusing gen7_surface_msaa_bits makes it easy to set both.
This also lets us drop the Gen8 surface_num_multisamples function.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Kenneth Graunke [Mon, 10 Feb 2014 19:06:03 +0000 (11:06 -0800)]
i965: Use ffs() for sample counting in gen7_surface_msaa_bits().
The enumerations are just log2(num_samples) shifted by 3, which we can
easily compute via ffs().
This also makes it reusable for Broadwell, which has 2x MSAA.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Kenneth Graunke [Mon, 10 Feb 2014 00:14:27 +0000 (16:14 -0800)]
i965: Simplify Broadwell's 3DSTATE_MULTISAMPLE sample count handling.
These enumerations are simply log2 of the number of multisamples shifted
by a bit, so we can calculate them using ffs() in a lot less code.
Suggested by Eric Anholt.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Ian Romanick [Tue, 18 Feb 2014 17:38:04 +0000 (09:38 -0800)]
glsl: Silence "type qualifiers ignored on function return type" warning
The const in
const unsigned foo(void);
is meaningless. Removing it silences this warning:
src/glsl/ast_to_hir.cpp:1802:56: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Ian Romanick [Tue, 18 Feb 2014 17:36:08 +0000 (09:36 -0800)]
glsl: Only warn for macro names containing __
From page 14 (page 20 of the PDF) of the GLSL 1.10 spec:
"In addition, all identifiers containing two consecutive underscores
(__) are reserved as possible future keywords."
The intention is that names containing __ are reserved for internal use
by the implementation, and names prefixed with GL_ are reserved for use
by Khronos. Names simply containing __ are dangerous to use, but should
be allowed.
Per the Khronos bug mentioned below, a future version of the GLSL
specification will clarify this.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Darius Spitznagel <d.spitznagel@goodbytez.de>
Cc: Tapani Pälli <lemody@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71870
Bugzilla: Khronos #11702
Ian Romanick [Tue, 18 Feb 2014 17:10:36 +0000 (09:10 -0800)]
glcpp: Only warn for macro names containing __
Section 3.3 (Preprocessor) of the GLSL 1.30 spec (and later) and the
GLSL ES spec (all versions) say:
"All macro names containing two consecutive underscores ( __ ) are
reserved for future use as predefined macro names. All macro names
prefixed with "GL_" ("GL" followed by a single underscore) are also
reserved."
The intention is that names containing __ are reserved for internal use
by the implementation, and names prefixed with GL_ are reserved for use
by Khronos. Since every extension adds a name prefixed with GL_ (i.e.,
the name of the extension), that should be an error. Names simply
containing __ are dangerous to use, but should be allowed. In similar
cases, the C++ preprocessor specification says, "no diagnostic is
required."
Per the Khronos bug mentioned below, a future version of the GLSL
specification will clarify this.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Darius Spitznagel <d.spitznagel@goodbytez.de>
Cc: Tapani Pälli <lemody@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71870
Bugzilla: Khronos #11702
Tom Stellard [Tue, 4 Feb 2014 16:54:38 +0000 (11:54 -0500)]
configure: Use LLVM shared libraries by default
Linking with LLVM static libraries is easily broken by changes to
the llvm-config program or when LLVM adds, removes, or changes library
components. Keeping up with these changes requires a lot of maintanence
effort to keep the build working on the master and stable branches.
Also, because of issues in the past LLVM static libraries, the release
manager is currently configuring with --with-llvm-shared-libs when
checking the build before release. Enabling shared libraries by
default would allow the release manager to run ./configure with
no arguments, and be reasonably confident that the build would succeed.
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
Francisco Jerez [Wed, 19 Feb 2014 14:36:48 +0000 (15:36 +0100)]
i965/fs: Allocate the param_size array dynamically.
Useful because the total number of uniform components might exceed
MAX_UNIFORMS * 4 in some cases because of the image metadata we'll be
passing as push constants.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Francisco Jerez [Wed, 19 Feb 2014 14:27:01 +0000 (15:27 +0100)]
i965/fs: Use a separate variable to keep track of the last uniform index seen.
Like the VEC4 back-end does. It will make dynamic allocation of the
param_size array easier in a future commit.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Rob Clark [Wed, 19 Feb 2014 17:02:57 +0000 (12:02 -0500)]
freedreno: tweak ringbuffer sizes/count
Since we are now consuming two ringbuffers at a time, we probably want a
pool larger than 4.. but we don't need each individual ringbuffer to be
so large, so offset the pool size increase by reducing rb size.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Wed, 19 Feb 2014 16:55:25 +0000 (11:55 -0500)]
freedreno/a3xx/compiler: scheduling/legalize fixes
It seems the write-after-read hazard that applies to texture fetch
instructions, also applies to sfu instructions.
Also, cat5/cat6 instructions do not have a (ss) bit, so in these
cases we need to insert a dummy nop instruction with (ss) bit set.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Francisco Jerez [Fri, 22 Nov 2013 23:59:56 +0000 (15:59 -0800)]
i965: Have brw_imm_vf4() take the vector components as integer values.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Francisco Jerez [Sat, 23 Nov 2013 20:08:00 +0000 (12:08 -0800)]
i965: Add helper function to find out the signedness of a register type.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Francisco Jerez [Fri, 29 Nov 2013 02:13:18 +0000 (18:13 -0800)]
i965/vec4: Use swizzle() in the ARB_vertex_program code.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Francisco Jerez [Fri, 29 Nov 2013 02:04:10 +0000 (18:04 -0800)]
i965/fs: Use offset() in the ARB_fragment_program code.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Francisco Jerez [Sun, 8 Dec 2013 03:59:11 +0000 (04:59 +0100)]
i965/fs: Remove fs_reg::retype.
There doesn't seem to be any reason for it to be a method, and it's
surprising that the expression 'reg.retype(t)' doesn't retype its
object but rather it creates a temporary with the new type. Use
'retype(reg, t)' instead.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Francisco Jerez [Thu, 28 Nov 2013 23:07:06 +0000 (15:07 -0800)]
i965/vec4: Trivial improvements to the with_writemask() function.
Add assertion that the register is not in the HW_REG or IMM file,
calculate the conjunction of the old and new mask instead of replacing
the old [consistent with the behavior of brw_writemask(), causes no
functional changes right now], make it static inline to let the
compiler do a slightly better job at optimizing things, and shorten
its name.
v2: Assert that the new writemask is not zero to avoid undefined
hardware behaviour.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Francisco Jerez [Wed, 19 Feb 2014 14:21:07 +0000 (15:21 +0100)]
i965: Make sure that backend_reg::type and brw_reg::type are consistent for fixed regs.
And define non-mutating helper functions to retype fixed and normal
regs with a common interface. At some point we may want to get rid of
::fixed_hw_reg completely and have fixed regs use the normal register
data members (e.g. backend_reg::reg to select a fixed GRF number,
src_reg::swizzle to store the swizzle, etc.), I have the feeling that
this is not the last headache we're going to get because of the
multiple ways to represent the same thing and the different register
interface depending on the file a register is stored in...
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Francisco Jerez [Wed, 19 Feb 2014 14:20:27 +0000 (15:20 +0100)]
i965/vec4: Add non-mutating helper functions to modify src_reg::swizzle and ::negate.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Francisco Jerez [Wed, 19 Feb 2014 14:19:10 +0000 (15:19 +0100)]
i965: Add non-mutating helper functions to modify the register offset.
Yes, we could avoid having four copies of essentially the same code by
using templates here.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Francisco Jerez [Sat, 23 Nov 2013 03:21:13 +0000 (19:21 -0800)]
i965/vec4: Fix off-by-one register class overallocation.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Francisco Jerez [Wed, 27 Nov 2013 03:56:07 +0000 (19:56 -0800)]
i965: Unify fs_generator:: and vec4_generator::mark_surface_used as a free function.
This way it can be used anywhere. I need it from the visitor.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Francisco Jerez [Wed, 19 Feb 2014 14:14:02 +0000 (15:14 +0100)]
i965: Move up duplicated fields from stage-specific prog_data to brw_stage_prog_data.
There doesn't seem to be any reason for nr_params, nr_pull_params,
param, and pull_param to be duplicated in the stage-specific
subclasses of brw_stage_prog_data. Moving their definition to the
common base class will allow some code sharing in a future commit, the
removal of brw_vec4_prog_data_compare and brw_*_prog_data_free, and
the simplification of the stage-specific brw_*_prog_data_compare.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Francisco Jerez [Sat, 23 Nov 2013 04:22:03 +0000 (20:22 -0800)]
i965/vec4: Add constructor of src_reg from a fixed hardware reg.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Kenneth Graunke [Fri, 7 Feb 2014 22:02:36 +0000 (14:02 -0800)]
i965: Enable fast depth clears.
They work fine now, too.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Fri, 7 Feb 2014 01:13:24 +0000 (17:13 -0800)]
i965: Enable HiZ on Broadwell.
It appears to work fine.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Fri, 7 Feb 2014 01:06:12 +0000 (17:06 -0800)]
i965: Implement HiZ resolves on Broadwell.
Broadwell's 3DSTATE_WM_HZ_OP packet makes this much easier.
Instead of programming the whole pipeline, we simply have to emit the
depth/stencil packets, a state override, and a pipe control. Then
arrange for the state to be put back. This is easily done from a single
function.
v2: Use minify(mt->logical_{width,height}0, level) in 3DSTATE_WM_HZ_OP
instead of intel_mipmap_level's width/height fields. Those were
based on the physical width/height, and thus wrong for MSAA buffers.
Eric also deleted those fields.
v3: Use 0xFFFF as the sample mask regardless of what the user set (as
this operation is unrelated); set the drawing rectangle to the
miplevel being operated on, rather than the whole surface; remove
unnecessary MAX2(..., 1) around mt->logical_depth0 (all suggested
by Eric Anholt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Fri, 7 Feb 2014 01:02:37 +0000 (17:02 -0800)]
i965: Refactor Gen8 depth packet emission.
The existing code followed the vtable function signature, which is not a
great fit: many of the parameters are unused, and the function still
inspects global state, making it less reusable.
This patch refactors the depth buffer packet emission code into a new
function which takes exactly the parameters it needs, and which uses no
global state. It then makes the existing vtable function call the new
one.
Ideally, we would remove the vtable function, and clean up that
interface. But that can happen once HiZ is working.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Fri, 7 Feb 2014 00:49:31 +0000 (16:49 -0800)]
i965: Add #defines for the 3DSTATE_WM_HZ_OP packet's contents.
We're going to need these to implement HiZ.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Thu, 6 Feb 2014 15:32:49 +0000 (07:32 -0800)]
i965: Bump generation check in code to disable HiZ at LODs > 0.
Broadwell's "HiZ Resolve" operation still has the restriction that the
rectangle primitive must be 8x4 aligned. So I believe we still need
this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Thu, 6 Feb 2014 15:21:41 +0000 (07:21 -0800)]
i965: Program 3DSTATE_HIER_DEPTH_BUFFER properly on Broadwell.
HiZ buffers still don't exist, but when they do, we'll set them up.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Sat, 8 Feb 2014 06:22:45 +0000 (22:22 -0800)]
i965: Pull format conversion logic out of brw_depthbuffer_format.
brw_depthbuffer_format is not very reusable at the moment, since it
uses global state (ctx->DrawBuffer) to access a particular depth buffer.
For HiZ on Broadwell, I need a function which simply converts the
formats. However, at least one existing user of brw_depthbuffer_format
really wants the existing interface. So, I've created a new function.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Chia-I Wu [Wed, 19 Feb 2014 05:04:04 +0000 (13:04 +0800)]
egl: clarify what _eglInitResource does
It is a helper called from the initializers of its subclasses.
Chia-I Wu [Wed, 19 Feb 2014 04:57:15 +0000 (12:57 +0800)]
Revert "egl: Unhide functionality in _eglInitContext()"
This reverts commit
1456ed85f0ed8b9c9f0abd6bd389a089fa3824b2.
_eglInitResource can and is supposed to be called on subclass objects.
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Chia-I Wu [Wed, 19 Feb 2014 04:57:11 +0000 (12:57 +0800)]
Revert "egl: Unhide functionality in _eglInitSurface()"
This reverts commit
498d10e230663f8604d00608cae6324f779c9cdd.
_eglInitResource can and is supposed to be called on subclass objects.
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Kenneth Graunke [Sun, 2 Feb 2014 11:03:39 +0000 (03:03 -0800)]
i965: Bump MaxTexMbytes from 1GB to 1.5GB.
Even with the other limits raised, TestProxyTexImage would still reject
textures > 1GB in size. This is an artificial limit; nothing prevents
us from having a larger texture. I stayed shy of 2GB to avoid the
larger-than-aperture situation.
For 3D textures, this raises the effective limit:
- RGBA8: 645 -> 738
- RGBA16: 512 -> 586
- RGBA32F: 406 -> 465
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74130
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Kenneth Graunke [Sun, 2 Feb 2014 10:58:42 +0000 (02:58 -0800)]
i965: Bump GL_MAX_CUBE_MAP_TEXTURE_SIZE to 8192.
Gen4+ supports 8192x8192 cube maps. Ivybridge and later can actually
support 16384, but that would place GL_MAX_CUBE_MAP_TEXTURE_SIZE above
GL_MAX_TEXTURE_SIZE, which seems like a bad idea.
(Unfortunately, we can't bump GL_MAX_TEXTURE_SIZE to 16384 without
causing regressions due to awful W-tiled stencil buffer interactions.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74130
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Kenneth Graunke [Sun, 2 Feb 2014 10:51:45 +0000 (02:51 -0800)]
i965: Bump MAX_3D_TEXTURE_SIZE to 2048.
It's highly unlikely that there will be enough memory in the system to
allocate enough space for this, but we should still expose the hardware
limit. It's what the Intel Windows driver does, and it seems most other
vendors do likewise.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74130
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Ian Romanick [Tue, 18 Feb 2014 23:24:21 +0000 (15:24 -0800)]
docs: Trivial updates to MESA_query_renderer.spec
Fix the version and the status before sending to Khronos for listing in
the registry.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Sinclair Yeh [Thu, 13 Feb 2014 00:21:11 +0000 (16:21 -0800)]
Prevent zero sized wl_egl_window
It is illegal to create or resize a window to zero (or negative) width
and/or height. This patch prevents such a request from happening.
Anuj Phogat [Thu, 19 Dec 2013 22:17:19 +0000 (14:17 -0800)]
glsl: Fix condition to generate shader link error
GL_ARB_ES2_compatibility doesn't say anything about shader linking
when one of the shaders (vertex or fragment shader) is absent. So,
the extension shouldn't change the behavior specified in GLSL
specification.
Tested the behavior on proprietary linux drivers of NVIDIA and AMD.
Both of them allow linking a version 100 shader program in OpenGL
context, when one of the shaders is absent.
Makes following Khronos CTS tests to pass:
successfulcompilevert_linkprogram.test
successfulcompilefrag_linkprogram.test
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Anuj Phogat [Sat, 15 Feb 2014 01:27:29 +0000 (17:27 -0800)]
mesa: Add GL_TEXTURE_CUBE_MAP_ARRAY to legal_get_tex_level_parameter_target()
Fixes failing Khronos CTS test packed_depth_stencil_init.test
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Eric Anholt [Fri, 14 Feb 2014 05:37:50 +0000 (21:37 -0800)]
i965/fs: Use conditional sends to do FB writes on HSW+.
This drops the MOVs for header setup, which are totally mis-scheduled.
total instructions in shared programs:
1590047 ->
1589331 (-0.05%)
instructions in affected programs: 43729 -> 43013 (-1.64%)
GAINED: 0
LOST: 0
glb27-trex:
x before
+ after
+-----------------------------------------------------------------------------+
| + x xx + + + |
| ++ + xxx ++x xx + ** *x+ + + + x * |
|+x xx x* x+++xx*x*xx+++*+*xx++** *x* x+***x*+xx+* + * + + *|
| |__|__________MA___A___________|___| |
+-----------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 49 62.33 65.41 63.49 63.53449 0.
62757822
+ 50 62.28 65.4 63.7 63.6982 0.656564
No difference proven at 95.0% confidence
Reviewed-by: Matt Turner <mattst88@gmail.com>
Eric Anholt [Fri, 14 Feb 2014 23:41:40 +0000 (15:41 -0800)]
i965/fs: Drop dead comment about the old proj_attrib_mask optimization.
The code was removed early last year.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Fri, 14 Feb 2014 21:00:40 +0000 (13:00 -0800)]
i965: Drop mt->levels[].width/height.
It often confused people because it was unclear on whether it was the
physical or logical, and people needed the other one as well. We can
recompute it trivially using the minify() macro, clarifying which value is
being used and making getting the other value obvious.
v2: Fix a pasteo in intel_blit.c's dst flip.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Thu, 13 Feb 2014 18:45:35 +0000 (10:45 -0800)]
i965: Move singlesample_mt to the renderbuffer.
Since only window system renderbuffers can have a singlesample_mt, this
lets us drop a bunch of sanity checking to make sure that we're just a
renderbuffer-like thing.
v2: Fix a badly-written comment (thanks Kenneth!), drop the now trivial
helper function for set_needs_downsample.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Thu, 13 Feb 2014 22:33:57 +0000 (14:33 -0800)]
i965: Drop some duplicated code in DRI winsys BO updates.
The only DRI2 vs DRI3 delta was just how to decide about frontbuffer-ness
for doing the upsample.
v2: Fix missing singlesample_mt->region->name update in the merged code,
which would have broken the DRI2 don't-recreate-the-miptree
optimization.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Thu, 13 Feb 2014 18:52:47 +0000 (10:52 -0800)]
i965: Simplify intel_miptree_updownsample.
Pretty silly to pass in values dereferenced out of one of the arguments.
v2: Get the destination size from the dst, even though the callers are
always dealing with src size == dst size cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Thu, 13 Feb 2014 23:58:21 +0000 (15:58 -0800)]
i965: Don't try to use the ctx->ReadBuffer when asked to blorp miptrees.
So far it's happened to be that we're only ever calling
intel_miptree_blit() (up/downsampling) from the ReadBuffer, but I stumbled
over a null ReadBuffer case when debugging later parts of the series.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Fri, 7 Feb 2014 22:20:34 +0000 (14:20 -0800)]
i965: Make the mt->target of multisample renderbuffers be 2D_MS.
Mostly mt->target == 2D_MS just results in a few checks that we don't try
to allocate multiple LODs and don't try to do slice copies with them. But
with the introduction of binding renderbuffers to textures, we need more
consistency.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Tue, 11 Feb 2014 20:10:59 +0000 (12:10 -0800)]
meta: Push into desktop GL mode when doing meta operations.
This lets us simplify our shaders, and rely on GLES-prohibited
functionality (like ARB_texture_multisample) when writing these
driver-internal functions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Fri, 14 Feb 2014 22:31:17 +0000 (14:31 -0800)]
meta: Fix blit shader compile on non-glsl-130 drivers.
Compare this VS to the one for the post-130 case. Fixes piglit
glsl-lod-bias, and presumably tons of other code (I haven't done a full
piglit run on swrast).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74911
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rob Clark [Tue, 18 Feb 2014 13:12:37 +0000 (08:12 -0500)]
configure: fix build error with XA
Fixes:
xa_tracker.c: In function 'xa_tracker_create':
xa_tracker.c:147:5: error: implicit declaration of function 'pipe_loader_drm_probe_fd' [-Werror=implicit-function-declaration]
in some build configurations, as XA now implicitly depends on
gallium_drm_loader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Michel Dänzer [Thu, 13 Feb 2014 02:51:09 +0000 (11:51 +0900)]
r600g,radeonsi: Consolidate logic for short-circuiting flushes
Fixes radeonsi emitting command streams to the kernel even when there
have been no draw calls before a flush, potentially powering up the GPU
needlessly.
Incidentally, this also cuts the runtime of piglit gpu.py in about half
on my Kaveri system, probably because an X11 client going away no longer
always results in a command stream being submitted to the kernel via
glamor.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=65761
Cc: "10.1" mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Emil Velikov [Thu, 13 Feb 2014 00:44:32 +0000 (00:44 +0000)]
st/dri: remove #ifdef DRM_CAP_PRIME guard
Required for libdrm 2.4.37 and earlier. Both scons and automake
require version 2.4.38 now so that guard is not longer needed.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Emil Velikov [Tue, 18 Feb 2014 00:08:03 +0000 (00:08 +0000)]
automake: remove leftover XORG and LIBKMS variables
No longer set or used since the removal of st/xorg.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Emil Velikov [Thu, 13 Feb 2014 00:28:27 +0000 (00:28 +0000)]
scons: sync package requirements
xorg-server and libkms is no longer required since the removal
of the xorg state-tracker.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Emil Velikov [Thu, 13 Feb 2014 00:23:16 +0000 (00:23 +0000)]
configure: bump up libdrm requirement to 2.4.38
This is the first version that introduced DRM_CAP_PRIME, which is
implicitly required by egl/wayland.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Emil Velikov [Tue, 4 Feb 2014 17:32:03 +0000 (17:32 +0000)]
configure: use test -n whenever possible
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Emil Velikov [Tue, 11 Feb 2014 14:47:37 +0000 (14:47 +0000)]
configure: use test -z whenever possible
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Emil Velikov [Tue, 4 Feb 2014 17:26:38 +0000 (17:26 +0000)]
configure: cleanup classic dri drivers handling
* Make sure that only drivers that are handled by configure.ac
are included in DRI_DIRS.
* Change with_dri_drivers default value to auto, and set enable
autodetection, when enable_opengl is on.
v2: Move "test" to the correct location.
v3: Squash DRI_DIRS handling before the switch statement.
Suggested by Ilia Mirkin
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Emil Velikov [Mon, 3 Feb 2014 21:05:19 +0000 (21:05 +0000)]
configure: compact ppc/sparc DRI_DIRS handling
Both arches have the same list of dri_dirs.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Emil Velikov [Mon, 3 Feb 2014 20:54:08 +0000 (20:54 +0000)]
configure: drop explicit DRI_DIRS assignment on some platforms/arches
Both x86_64|amd64 and *bsd, already set the full range of available
classic dri drivers. Drop the explicit assignment, and fall back to
the generic default.
Keep explicit list from plafroms/arches that do not handle the default
list.
Update help strings, to explicitly mention "classic" for applicable
DRI drivers.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Emil Velikov [Mon, 3 Feb 2014 20:38:10 +0000 (20:38 +0000)]
configure: cleanup switch statement
Move all the cases within one switch statement and handle
i9{1,6}5 and r{adeon,200} independently.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kusanagi Kouichi [Mon, 17 Feb 2014 08:29:14 +0000 (17:29 +0900)]
targets/vdpau: Don't link unused libraries
libvdpau, libselinux and libexpat are not used.
Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
Kusanagi Kouichi [Sat, 15 Feb 2014 02:53:00 +0000 (11:53 +0900)]
configure: Try pkg-config first for libselinux
v2 (Emil) Add SELINUX_CFLAGS in the respective locations
Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Kusanagi Kouichi [Wed, 12 Feb 2014 07:07:55 +0000 (16:07 +0900)]
targets/vdpau: Always use c++ to link
If built without llvm, the following error occurs with mplayer:
Failed to open VDPAU backend .../libvdpau_r600.so: undefined symbol: _ZTVN10__cxxabiv117__class_type_infoE
[vo/vdpau] Error when calling vdp_device_create_x11: 1
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Ilia Mirkin [Sun, 16 Feb 2014 07:21:59 +0000 (02:21 -0500)]
st/xvmc: fix tests so that they pass
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Christian König <christian.koenig@amd.com>
Rob Clark [Mon, 10 Feb 2014 15:45:36 +0000 (10:45 -0500)]
pipe-loader: add pipe loader for freedreno/msm
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Mon, 10 Feb 2014 15:44:02 +0000 (10:44 -0500)]
st/xa: missing handle type
DRM_API_HANDLE_TYPE_SHARED is zero, so doesn't actually fix anything.
But we shouldn't rely on SHARED handle type being zero.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Mon, 10 Feb 2014 15:43:11 +0000 (10:43 -0500)]
st/xa: use pipe-loader to get screen
This lets multiple gallium drivers use XA.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Mon, 10 Feb 2014 14:39:23 +0000 (09:39 -0500)]
pipe-loader: split out "client" version
Build two versions of pipe-loader, with only the client version linking
in x11 client side dependencies. This will allow the XA state tracker
to use pipe-loader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sun, 16 Feb 2014 00:01:38 +0000 (19:01 -0500)]
freedreno/a3xx/compiler: use (ss) for WAR hazards
Seems texture sample instructions don't immediately consume there
src(s). In fact, some shaders from blob compiler seem to indiciate that
it does not even count the texture sample instructions when calculating
number of delay slots to fill for non-sample instructions. (Although so
far it seems inconclusive as to whether this is required.)
In particular, when a src register of a previous texture sample
instruction is clobbered, the (ss) bit is needed to synchronize with the
tex pipeline to ensure it has picked up the previous values before they
are overwritten.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sun, 16 Feb 2014 12:41:59 +0000 (07:41 -0500)]
freedreno/a3xx/compiler: fix RA typo
Was supposed to be a '+', otherwise we end up with a negative offset and
choosing registers below the assigned range.
This seems to fix the scheduling mystery "solved" by adding in extra
delay slots.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sun, 16 Feb 2014 12:35:20 +0000 (07:35 -0500)]
freedreno/a3xx/compiler: handle kill properly (new compiler)
Since 'kill' does not produce a result, the new compiler was happily
optimizing them out. We need to instead track 'kill's similar to
outputs. But since there is no non-predicated kill instruction,
(and for flattend if/else we do want them to be predicated), we need
to track the topmost branch condition on the stack and use that as src
arg to the kill. For a kill at the topmost level, we have to generate
an immediate 1.0 to feed into the cmps.f for setting the predicate
register.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sun, 16 Feb 2014 12:29:13 +0000 (07:29 -0500)]
freedreno/a3xx/compiler: trans_cmp() sanity
Thanks to figuring out 32bit float render target, and adding regdump
test in fdre-a3xx, I can more easily play around with instructions to
figure out range of inputs/outputs/etc. And from this I can conclude
that cmps.f works more like expected and I can do something much more
simple in trans_cmp() (compared to before which was more closely
emulating the instruction sequence of the blob compiler).
And using sel.b32 (binary 0/1) often makes more sense than sel.f32
(+/- float) or sel.u32 (+/- uint) as it can use the output directly
from cmps.f without needing the 'add.s tmp0, tmp0, -1'.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Mon, 10 Feb 2014 14:14:15 +0000 (09:14 -0500)]
freedreno: fix problems if no color buf bound
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Eric Anholt [Wed, 5 Feb 2014 23:45:25 +0000 (15:45 -0800)]
meta: Don't try to enable FF texturing when we're using GLSL.
On a core context, this would throw an error.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Carl Worth [Thu, 13 Feb 2014 17:49:27 +0000 (09:49 -0800)]
main: Avoid double-free of shader Label
As documented, the _mesa_free_shader_program_data function:
"Frees all the data that hangs off a shader program object, but not
the object itself."
This means that this function may be called multiple times on the same object,
(and has been observed to). Meanwhile, the shProg->Label field was not being
set to NULL after its free(). This led to a second call to free() of the same
address on the second call to this function.
Fix this by setting this field to NULL after free(), (just as with all other
calls to free() in this function).
Reviewed-by: Brian Paul <brianp@vmware.com>
CC: mesa-stable@lists.freedesktop.org
Brian Paul [Fri, 14 Feb 2014 14:45:23 +0000 (07:45 -0700)]
gallium/pipebuffer: change pb_cache_manager_create() size_factor to float
Requested by Marek.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Thomas Hellstrom [Sat, 8 Feb 2014 17:51:15 +0000 (09:51 -0800)]
svga/winsys: Propagate surface shared information to the winsys
The linux winsys needs to know whether a surface is shared.
For guest-backed surfaces we need this information to avoid allocating a
mob out of the mob cache for shared surfaces, but instead allocate a shared
mob, that is never put in the mob cache, from the kernel.
Also previously, all surfaces were given the "shareable" attribute when
allocated from the kernel. This is too permissive for client-local surfaces.
Now that we have the needed info, only set the "shareable" attribute if the
client indicates that it needs to share the surface.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Brian Paul [Sat, 8 Feb 2014 17:51:15 +0000 (09:51 -0800)]
svga/winsys: implement GBS support
This is a squash commit of many commits by Thomas Hellstrom.
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Thomas Hellstrom [Sat, 8 Feb 2014 17:51:15 +0000 (09:51 -0800)]
gallium/util: Add flush/map debug utility code
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Thomas Hellstrom [Sat, 8 Feb 2014 17:51:15 +0000 (09:51 -0800)]
gallium/pipebuffer: Add a cache buffer manager bypass mask
In some situations, it may be desirable to bypass the cache at buffer
creation but to insert the buffer in the cache at buffer destruction.
One such situation is where we already have a kernel representation of a
buffer that we want to use, but we also want to insert it in the cache when
it's freed up.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Thomas Hellstrom [Sat, 8 Feb 2014 17:51:15 +0000 (09:51 -0800)]
pipebuffer, winsys: Add a size match parameter to the cached buffer manager
In some situations it's important to restrict the sizes of buffers that the
cached buffer manager is allowed to return
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Brian Paul [Sat, 8 Feb 2014 17:51:15 +0000 (09:51 -0800)]
svga: update texture code for GBS
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Brian Paul [Sat, 8 Feb 2014 17:51:15 +0000 (09:51 -0800)]
svga: update buffer code for GBS
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>