Kenneth Graunke [Tue, 11 Apr 2017 07:04:29 +0000 (00:04 -0700)]
i965/drm: Use bools for a few flags.
These one bit values are booleans.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Kenneth Graunke [Tue, 11 Apr 2017 07:02:35 +0000 (00:02 -0700)]
i965/drm: Make brw_bo_alloc_tiled flags parameter 32-bit.
unsigned long is a terrible type for a bitfield - if you need fewer
than 32 bits, it wastes 4 bytes. If you need more, things break on
32-bit builds. Just use unsigned.
Even that's a bit ridiculous as we only have one flag today.
Still, it's at least somewhat better.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Kenneth Graunke [Tue, 11 Apr 2017 06:10:04 +0000 (23:10 -0700)]
i965/drm: Make BO size a uint64_t rather than unsigned long.
The drm_i915_gem_create ioctl structure uses a __u64 for the size,
so we should probably use uint64_t to match. In theory, we could
probably have a BO larger than 4GB, using a 48-bit PPGTT - it just
wouldn't be mappable in the CPU's 32-bit address space.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Kenneth Graunke [Tue, 11 Apr 2017 06:55:21 +0000 (23:55 -0700)]
i965/drm: Make alignment parameter a uint64_t.
Theoretically, with a 48-bit address space, we could have buffers
with an alignment of >= 4GB. It's a bit silly, but the exec_object
structs (drm_i915_gem_exec_object2) use a __u64 for this, so we may
as well use the same type as the kernel API.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Kenneth Graunke [Tue, 11 Apr 2017 06:08:23 +0000 (23:08 -0700)]
i965/drm: Make stride/pitch a uint32_t.
struct drm_i915_gem_set_tiling's stride field is a __u32.
intel_mipmap_tree::stride is a uint32_t. Using unsigned long just
doesn't make sense. Switching also lets us drop many pointless
locals that only existed to deal with the type mismatch.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Kenneth Graunke [Tue, 11 Apr 2017 06:00:24 +0000 (23:00 -0700)]
i965/drm: Fix types for pwrite/pread fields.
The ioctl structs contain __u64 offset and size fields, so make them
uint64_t rather than unsigned long.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Kenneth Graunke [Tue, 11 Apr 2017 06:31:20 +0000 (23:31 -0700)]
i965/drm: Make brw_bo_alloc_tiled take tiling by value, not pointer.
For some reason we passed tiling by pointer, through several layers,
even though the functions only read the initial value, and never
actually change it. We even had a do-while loop that executed until
the tiling mode matched - except it always did, so it only ran once.
We then had bogus error handling in case it changed the tiling mode
to something nonsensical...which it never did.
Drop all this nonsense.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Timothy Arceri [Tue, 11 Apr 2017 04:30:15 +0000 (14:30 +1000)]
mesa/st: remove _mesa_get_fallback_texture() calls
These calls look like leftover from fallback texture support first
being added to the st in
8f6d9e12be0be and then later being added
to core mesa in
00e203fe17cbf21.
The piglit test fp-incomplete-tex continues to work with this
change.
Reviewed-by: Brian Paul <brianp@vmware.com>
Timothy Arceri [Mon, 10 Apr 2017 12:21:37 +0000 (22:21 +1000)]
mesa: use pre_hashed version of search for the mesa hash table
The key is just an unsigned int so there is never any real hashing
done.
Reviewed-by: Eric Anholt <eric@anholt.net>
Tim Rowley [Fri, 7 Apr 2017 21:51:42 +0000 (16:51 -0500)]
swr: [rasterizer core] Disable 8x2 tile backend
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Fri, 7 Apr 2017 21:31:36 +0000 (16:31 -0500)]
swr: [rasterizer common] Add _simd_testz_si alias
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Fri, 7 Apr 2017 16:41:25 +0000 (11:41 -0500)]
swr: [rasterizer archrast] Fix archrast for MSVC 2017 compiler
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Fri, 7 Apr 2017 15:58:38 +0000 (10:58 -0500)]
swr: [rasterizer jitter] Remove unused function
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Fri, 7 Apr 2017 12:57:11 +0000 (07:57 -0500)]
swr: [rasterizer jitter] Remove HAVE_LLVM tests supporting llvm < 3.8
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Fri, 7 Apr 2017 09:37:25 +0000 (04:37 -0500)]
swr: [rasterizer common/core] Fix 32-bit windows build
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Fri, 7 Apr 2017 03:11:45 +0000 (22:11 -0500)]
swr: [rasterizer core] Fix unused variable warnings
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Thu, 6 Apr 2017 23:21:54 +0000 (18:21 -0500)]
swr: [rasterizer core] Code formating change
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Thu, 6 Apr 2017 21:37:03 +0000 (16:37 -0500)]
swr: [rasterizer core] SIMD16 Frontend WIP - PA
Fix PA NextPrim for SIMD8 on SIMD16.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Thu, 6 Apr 2017 20:22:55 +0000 (15:22 -0500)]
swr: [rasterizer core] SIMD16 Frontend WIP - Clipper
Implement widened clipper for SIMD16.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Sat, 1 Apr 2017 01:33:43 +0000 (20:33 -0500)]
swr: [rasterizer core] Multisample sample position setup change
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Fri, 31 Mar 2017 21:50:40 +0000 (16:50 -0500)]
swr: [rasterizer core] Reduce templates to speed compile
Quick patch to remove some unused template params to cut down
rasterizer compile time.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Francisco Jerez [Mon, 10 Apr 2017 00:28:58 +0000 (17:28 -0700)]
i965/fs: Take into account lower frequency of conditional blocks in spilling cost heuristic.
The individual branches of an if/else/endif construct will be executed
some unknown number of times between 0 and 1 relative to the parent
block. Use some factor in between as weight while approximating the
cost of spill/fill instructions within a conditional if-else branch.
This favors spilling registers used within conditional branches which
are likely to be executed less frequently than registers used at the
top level.
Improves the framerate of the SynMark2 OglCSDof benchmark by ~1.9x on
my SKL GT4e. Should have a comparable effect on other platforms. No
significant regressions.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tim Rowley [Tue, 11 Apr 2017 16:50:23 +0000 (11:50 -0500)]
swr: return true for PIPE_CAP_DOUBLES
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Kenneth Graunke [Tue, 11 Apr 2017 15:33:20 +0000 (08:33 -0700)]
i965: Set kernel features before computing max GL version.
We check these bitfields when computing the Haswell max GL version.
We need to set them ahead of time, or they won't exist, and all our
checks will fail. That sets the max core profile GL version to 4.2.
This introduces the bizarre situation where asking for a GL context
with version 4.3+ fails, but asking for a GL core profile context
with version <= 4.2 actually promotes you a 4.5 context.
GLX_MESA_query_renderer also reported the bogus 4.2 value.
Now it shows 4.5.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reported-and-tested-by: Rafael Ristovski <rafael.ristovski@gmail.com>
Juan A. Suarez Romero [Tue, 11 Apr 2017 11:15:31 +0000 (13:15 +0200)]
anv: remove needless VALGRIND_MAKE_MEM_DEFINED
This is already invoked in the following VG_NOACCESS_READ() call.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Lucas Stach [Mon, 21 Nov 2016 11:32:15 +0000 (12:32 +0100)]
etnaviv: enable TS, but disable autodisable
Autodisable seems to cause missed rendering in some cases, but
otherwise TS seems to work properly.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Lucas Stach [Mon, 21 Nov 2016 11:29:04 +0000 (12:29 +0100)]
etnaviv: enable TS also on sampler resources
Fixes a performance issue with imported winsys buffers as those are
marked with binding sampler view.
This might require a TS flush on single pipe chips that directly
sample from the rendered buffer, but otherwise seems to work fine.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Lucas Stach [Mon, 21 Nov 2016 11:27:47 +0000 (12:27 +0100)]
etnaviv: align TS surface size to number of pixel pipes
The TS surface gets cleared by a tiled RS fill. If the chip has
more than 1 pixel pipe the size of the TS surface needs to be
aligned so that each pipe address matches a tile start, otherwise
the RS will hang.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Lucas Stach [Mon, 21 Nov 2016 11:25:29 +0000 (12:25 +0100)]
etnaviv: avoid using invalid TS
The TS is only valid after it has been initialized by a fast
clear, so it should not be taken into account when blitting
resources that haven't been cleared. Also the blit itself
invalidates the destination TS, as it's not updated and will
retain data from the previous rendering after the blit.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Samuel Pitoiset [Mon, 10 Apr 2017 17:23:17 +0000 (19:23 +0200)]
glsl: use the BA1 macro for textureQueryLevels()
For both consistency and new bindless sampler types.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Samuel Pitoiset [Mon, 10 Apr 2017 17:23:16 +0000 (19:23 +0200)]
glsl: use the BA1 macro for textureSamples()
For both consistency and new bindless sampler types.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Samuel Pitoiset [Mon, 10 Apr 2017 17:23:15 +0000 (19:23 +0200)]
glsl: use the BA1 macro for textureCubeArrayShadow()
For both consistency and new bindless sampler types.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Bas Nieuwenhuizen [Mon, 10 Apr 2017 20:20:19 +0000 (22:20 +0200)]
radv: Implement pipeline statistics queries.
The devil is in the shader again, otherwise this is
fairly straightforward.
The CTS contains no pipeline statistics copy to buffer
testcases, so I did a basic smoketest.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Mon, 10 Apr 2017 21:54:51 +0000 (23:54 +0200)]
radv: Let count be dynamic in radv_break_on_count.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Mon, 10 Apr 2017 19:49:48 +0000 (21:49 +0200)]
radv: Rename query pipeline/set layout.
For using them with both occlusion and pipeline statistics queries.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Mon, 10 Apr 2017 19:46:07 +0000 (21:46 +0200)]
radv: Use VK_WHOLE_SIZE for the query buffer bindings.
The buffer sizes are specified just a few lines earlier, so don't
repeat ourselves.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sun, 9 Apr 2017 20:35:32 +0000 (22:35 +0200)]
radv: Use a shader for occlusion CmdCopyQueryPoolResults.
Use the new occlusion query copy shader.
We don't use the shader for the waiting as a polling loop ineracts badly
with having caching enabled. I noticed on my GPU (Tonga) that the values
are written out in order, so I just use a WAIT_REG_MEM on the last value.
If it turns out other chips don't do that we may need to look a bit more
into this. Having 8 WAIT_REG_MEM packets per query doesn't sound ideal.
This also restricts the availability word in the pool to timestamp queries
only, as occlusion queries don't use it, and pipeline statistic queries
likely won't either.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sun, 26 Feb 2017 17:21:01 +0000 (18:21 +0100)]
radv: Add occlusion query shader.
Adds a shader for writing occlusion query results to a buffer, as the
CP packet isn't support on SI or secondary buffers, and doesn't handle
the availability bit (or partial results) nor truncation to 32-bit.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Kenneth Graunke [Tue, 11 Apr 2017 06:23:13 +0000 (23:23 -0700)]
i965: Fix wonky indentation left by brw_bo_alloc_tiled rename.
Ilia Mirkin [Sat, 8 Apr 2017 22:31:35 +0000 (18:31 -0400)]
nouveau: when mapping a persistent buffer, synchronize on former xfers
If the buffer is being used, we should wait for those uses to be
complete before returning the map.
Fixes: GL45-CTS.direct_state_access.buffers_functional
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Ilia Mirkin [Sat, 8 Apr 2017 18:56:16 +0000 (14:56 -0400)]
nvc0: increase texture buffer object alignment to 256 for pre-GM107
We currently don't pass the low byte of the address via the surface
info, so in order to work with images, these have to implicitly be
aligned to 256. The proprietary driver also doesn't go out of its way to
provide lower alignment.
Fixes GL45-CTS.texture_buffer.texture_buffer_texture_buffer_range
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Timothy Arceri [Mon, 10 Apr 2017 23:57:45 +0000 (09:57 +1000)]
mesa: fix typo and add assert() to _mesa_attach_renderbuffer_without_ref()
This function should only be used with a "freshly created" renderbuffer
so assert RefCount is 1.
Kenneth Graunke [Mon, 10 Apr 2017 06:14:56 +0000 (23:14 -0700)]
i965/drm: Add stall warnings when mapping or waiting on BOs.
This restores the performance warnings removed in:
i965: Drop brw_bo_map[_gtt] wrappers which issue perf warnings.
but adds them for nearly all BO mapping, and also for wait_rendering.
Because we add this to the core bufmgr, we automatically get stall
warnings in all callers, unlike before where only a few callsites used
the wrappers that gave stall warnings.
We also do it a bit differently: we simply measure how long set_domain
takes (the part that stalls), and complain if it's more than 0.01 ms.
We don't bother calling brw_bo_busy(), and we don't measure the mmap
time (which doesn't stall). This should be more accurate.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Kenneth Graunke [Mon, 10 Apr 2017 05:58:57 +0000 (22:58 -0700)]
i965/drm: Make a set_domain() helper function.
Less boilerplate.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Daniel Vetter [Thu, 6 Apr 2017 07:13:47 +0000 (09:13 +0200)]
i965/batch: Ensure we use a consistent offset in relocs
In theory gcc is free to re-load them, and if a concurrent
execbuf races and updates bo->offset64 then we have a problem:
execbuffer api requires that the ->presumed_offset and the one
we used for the reloc matches. It does not require that the value
is sensible, which means no locks needed, just a consistent load.
Ken said his next series will nuke this, so just hand-roll the
kernel's READ_ONCE idea inline.
FIXME: Most callers of brw_emit_reloc recompute the relocation
themselves, which means this doesn't really fix the race. But the long
term plan is to move to per-context relocation handling, which will
fix this all properly. So leave this for now as just a reminder.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Daniel Vetter [Thu, 6 Apr 2017 06:48:08 +0000 (08:48 +0200)]
i965/bufmgr: Garbage-collect vma cache/pruning
This was done because the kernel has 1 global address space, shared
with all render clients, for gtt mmap offsets, and that address space
was only 32bit on 32bit kernels.
This was fixed in
commit
440fd5283a87345cdd4237bdf45fb01130ea0056
Author: Thierry Reding <treding@nvidia.com>
Date: Fri Jan 23 09:05:06 2015 +0100
drm/mm: Support 4 GiB and larger ranges
which shipped in 4.0. Of course you still want to limit the bo cache
to a reasonable size on 32bit apps to avoid ENOMEM, but that's better
solved by tuning the cache a bit. On 64bit, this was never an issue.
On top, mesa never set this, so it's all dead code. Collect an trash it.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Daniel Vetter [Thu, 6 Apr 2017 06:28:51 +0000 (08:28 +0200)]
i965/bufmgr: Remove some reuse functions
is_reusable was needed by uxa because it couldn't keep track of its
scanout buffers and used this as a proxy. Disabling reuse is a silly
idea, we set this once at start. Remove both.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Daniel Vetter [Thu, 6 Apr 2017 06:27:47 +0000 (08:27 +0200)]
i965/bufmgr: remove start_gtt_access
Iirc this was used by uxa for persistent mmpas of the frontbuffer. For
mesa all the set_domain stuff needed before a synchronized mmap is handled
within the bufmgr, so no reason ever to call this.
Inline the implementation into its only internal user.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Daniel Vetter [Thu, 6 Apr 2017 06:25:50 +0000 (08:25 +0200)]
i965/bufmgr: Delete set_tiling
Entirely unused, and really shouldn't be used. The alloc functions already
take care of this. And even in a future where we're not going to
h/v-align tiled buffers in the bufmgr, but only in isl, I think we
still want to adjust the tiling mode in the bufmgr, since that ties in
closely to mmaps and stuff like that.
get_tiling is still needed for the import paths (until we have modifiers
everywhere).
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Daniel Vetter [Thu, 6 Apr 2017 06:23:28 +0000 (08:23 +0200)]
i965/bufmgr: Delete alloc_for_render
Entirely unused, mesa instead used the BO_ALLOC_FOR_RENDER flag.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Kenneth Graunke [Wed, 5 Apr 2017 21:10:36 +0000 (14:10 -0700)]
i965/drm: Use list_for_each_entry_safe in a couple of cases.
Suggested by Chris Wilson. A tiny bit simpler.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Kenneth Graunke [Tue, 4 Apr 2017 04:01:51 +0000 (21:01 -0700)]
i965/drm: Rename intel_bufmgr_gem.c to brw_bufmgr.c.
Matches the class name and the header file name.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 4 Apr 2017 03:54:16 +0000 (20:54 -0700)]
i965/drm: Reindent intel_bufmgr_gem.c and brw_bufmgr.h.
indent -i3 -nut -br -brs -npcs -ce --no-tabs -Tuint32_t -Tuint64_t
plus some manual fixes because those aren't quite the right settings.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 4 Apr 2017 03:13:08 +0000 (20:13 -0700)]
i965/drm: Rename drm_bacon_bo to brw_bo.
The bacon is all gone.
This renames both the class and the related functions. We're about to
run indent on the bufmgr code, so no need to worry about fixing bad
indentation.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 4 Apr 2017 03:22:59 +0000 (20:22 -0700)]
i965: Drop brw_bo_map[_gtt] wrappers which issue perf warnings.
The stupid reason for eliminating these functions is that I'm about
to rename drm_bacon_bo_map() to brw_bo_map(), which makes the real
function have the short name, rather than the wrapper.
I'm also planning on reworking our mapping code soon, so we use WC
mappings and proper unsynchronized mappings on non-LLC platforms.
It will be easier to do that without thinking about the stall
warnings and wrappers.
My eventual hope is to put the performance warnings in the BO map
function itself, so all callers gain the warning.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 4 Apr 2017 03:14:11 +0000 (20:14 -0700)]
i965/drm: Rename drm_bacon_reg_read() to brw_reg_read().
Less bacon.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 4 Apr 2017 01:10:23 +0000 (18:10 -0700)]
i965/drm: Rename drm_bacon_bufmgr to struct brw_bufmgr.
Also stop using typedefs, per Mesa coding style.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 4 Apr 2017 00:32:19 +0000 (17:32 -0700)]
i965: Just use a uint32_t context handle rather than a malloc'd wrapper.
drm_bacon_context is a malloc'd struct containing a uint32_t context ID
and a pointer back to the bufmgr. The bufmgr pointer is pretty useless,
as everybody already has brw->bufmgr. At that point...we may as well
just use the ctx_id handle directly. A number of places already had to
call drm_bacon_gem_context_get_id() to extract the ID anyway. Now they
just have it.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 4 Apr 2017 00:44:57 +0000 (17:44 -0700)]
i965/drm: Fold drm_bacon_gem_reset_stats into the callers.
We're going to get rid of drm_bacon_context shortly, so we'd have to
change the interface slightly. It's basically just an ioctl wrapper
that isn't terribly bufmgr-related, so We may as well just combine it
with the code in brw_reset.c that actually uses it.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 4 Apr 2017 00:17:36 +0000 (17:17 -0700)]
i965/drm: Rename drm_bacon_gem_bo_bucket to bo_cache_bucket.
No need for a prefix as this struct is local to the .c file.
Less bacon.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Mon, 3 Apr 2017 23:57:44 +0000 (16:57 -0700)]
i965/drm: Drop drm_bacon_* from static functions.
Mesa style is to not use lengthy prefixes for static functions.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Mon, 3 Apr 2017 23:55:06 +0000 (16:55 -0700)]
i965/drm: Drop drm_bacon_gem_bo_madvise_internal().
The only difference is that it takes an explicit bufmgr rather than
using bo->bufmgr, but there is only one bufmgr per screen so they
should be identical anyway.
Chris says this was added primarly to avoid bo/bo_gem casting,
which was inconvenient.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Mon, 3 Apr 2017 22:55:16 +0000 (15:55 -0700)]
i965/drm: Merge drm_bacon_bo_gem into drm_bacon_bo.
The separate class gives us a bit of extra encapsulation, but I don't
know that it's really worth the boilerplate. I think we can reasonably
expect the rest of the driver to be responsible.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Mon, 3 Apr 2017 22:39:09 +0000 (15:39 -0700)]
i965/drm: Merge bo->handle and bo_gem->gem_handle.
These fields are the same value. In the bad old days, bo->handle could
have been an identifier from the pre-GEM fake bufmgr, but that's long
gone. Keep the "gem_handle" name for clarity.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Wed, 29 Mar 2017 03:20:00 +0000 (20:20 -0700)]
i965/drm: Rewrite relocation handling.
The execbuf2 kernel API requires us to construct two kinds of lists.
First is a "validation list" (struct drm_i915_gem_exec_object2[])
containing each BO referenced by the batch. (The batch buffer itself
must be the last entry in this list.) Each validation list entry
contains a pointer to the second kind of list: a relocation list.
The relocation list contains information about pointers to BOs that
the kernel may need to patch up if it relocates objects within the VMA.
This is a very general mechanism, allowing every BO to contain pointers
to other BOs. libdrm_intel models this by giving each drm_intel_bo a
list of relocations to other BOs. Together, these form "reloc trees".
Processing relocations involves a depth-first-search of the relocation
trees, starting from the batch buffer. Care has to be taken not to
double-visit buffers. Creating the validation list has to be deferred
until the last minute, after all relocations are emitted, so we have the
full tree present. Calculating the amount of aperture space required to
pin those BOs also involves tree walking, which is expensive, so libdrm
has hacks to try and perform less expensive estimates.
For some reason, it also stored the validation list in the global
(per-screen) bufmgr structure, rather than as an local variable in the
execbuffer function, requiring locking for no good reason.
It also assumed that the batch would probably contain a relocation
every 2 DWords - which is absurdly high - and simply aborted if there
were more relocations than the max. This meant the first relocation
from a BO would allocate 180kB of data structures!
This is way too complicated for our needs. i965 only emits relocations
from the batchbuffer - all GPU commands and state such as SURFACE_STATE
live in the batch BO. No other buffer uses relocations. This means we
can have a single relocation list for the batchbuffer. We can add a BO
to the validation list (set) the first time we emit a relocation to it.
We can easily keep a running tally of the aperture space required for
that list by adding the BO size when we add it to the validation list.
This patch overhauls the relocation system to do exactly that. There
are many nice benefits:
- We have a flat relocation list instead of trees.
- We can produce the validation list up front.
- We can allocate smaller arrays and dynamically grow them.
- Aperture space checks are now (a + b <= c) instead of a tree walk.
- brw_batch_references() is a trivial validation list walk.
It should be straightforward to make it O(1) in the future.
- We don't need to bloat each drm_bacon_bo with 32B of reloc data.
- We don't need to lock in execbuffer, as the data structures are
context-local, and not per-screen.
- Significantly less code and a better match for what we're doing.
- The simpler system should make it easier to take advantage of
I915_EXEC_NO_RELOC in a future patch.
Improves performance in Synmark 7.0's OglBatch7:
- Skylake GT4e: 12.1499% +/- 2.29531% (n=130)
- Apollolake: 3.89245% +/- 0.598945% (n=35)
Improves performance in GFXBench4's gl_driver2 test:
- Skylake GT4e: 3.18616% +/- 0.867791% (n=229)
- Apollolake: 4.1776% +/- 0.240847% (n=120)
v2: Feedback from Chris Wilson:
- Omit explicit zero initializers for garbage execbuf fields.
- Use .rsvd1 = ctx_id rather than i915_execbuffer2_set_context_id
- Drop unnecessary fencing assertions.
- Only use _WR variant of execbuf ioctl when necessary.
- Shrink the arrays to be smaller by default.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 28 Mar 2017 23:13:41 +0000 (16:13 -0700)]
i965/drm: Make register write check handle execbuffer directly.
I'm about to rewrite how relocation handling works, at which point
drm_bacon_bo_emit_reloc() and drm_bacon_bo_mrb_exec() won't exist
anymore. This code is already largely not using the batchbuffer
infrastructure, so just go all the way and handle relocations, the
validation list, and execbuffer ourselves. That way, we don't have
to think the weird case where we only have a screen, and no context,
when redesigning the relocation handling.
v2: Write reloc.presumed_offset + reloc.delta into the batch, rather
than duplicating the comment, so it's obvious that they match
(suggested by Chris). Also add a comment about why we don't do
any error checking.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Fri, 31 Mar 2017 05:27:42 +0000 (22:27 -0700)]
i965: Make a screen::aperture_threshold field.
This is the threshold after which drm_intel_bufmgr_check_aperture_space
returns -ENOSPC, signalling that it thinks an execbuf is likely to fail
and we need to roll back and flush the batch.
We'll need this when we rewrite aperture space checking, shortly.
In the meantime, we can also use it in GLX_MESA_query_renderer.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 28 Mar 2017 23:49:35 +0000 (16:49 -0700)]
i965: Make/use a brw_batch_references() wrapper.
We'll want to change the implementation of this shortly.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 28 Mar 2017 21:41:39 +0000 (14:41 -0700)]
i965: Use brw_emit_reloc() instead of drm_bacon_bo_emit_reloc().
I'm about to make brw_emit_reloc do actual work, so everybody needs
to start using it and not the raw drm_bacon function.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 28 Mar 2017 22:03:49 +0000 (15:03 -0700)]
i965: Change intel_batchbuffer_reloc() into brw_emit_reloc().
This renames intel_batchbuffer_reloc to brw_emit_reloc and changes the
parameter naming and ordering to match drm_intel_bo_emit_reloc().
For now, it's a trivial wrapper that accesses batch->bo. When we
rework relocations, it will start doing actual work.
target_offset should be expanded to a uint64_t to match the kernel,
but for now we leave it as its original 32-bit type.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Mon, 3 Apr 2017 06:35:27 +0000 (23:35 -0700)]
i965/drm: Drop GEM_SW_FINISH stuff.
This is only useful when doing an incoherent CPU mapping of the current
scanout buffer. That's a terrible plan, so we never do it. We always
use an uncached GTT map.
So, this is useless. Drop the code.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Fri, 31 Mar 2017 16:44:54 +0000 (09:44 -0700)]
i965/drm: Drop code to search for an existing bufmgr.
This functionality was added by libdrm commit
743af59669386cb6e063fa4bd85f0a0b2da86295 (intel: make bufmgr_gem
shareable from different API) in an attempt to solve libva/mesa buffer
sharing problems. Specifically, this was working around an issue hit
by Chromium, which used the same drm_fd for multiple APIs, and shared
buffers between them.
This code attempted to work around that issue by using the same bufmgr
for both libva and Mesa. It worked because libdrm_intel was loaded by
both libraries. However, now that Mesa has forked, we don't have a
common library, and this code cannot work.
The correct solution is to have each API open its own file descriptor
(and get a corresponding buffer manager), and then use PRIME export
and import to share BOs across those APIs. Then the kernel can manage
those shared resources. According to Chris, the kernel will pass back
the same handle for a prime FD if the lookup is from the same device FD.
We believe Chromium has since moved to this model.
In Mesa, there is already only one screen per FD, and so there will
only be one bufmgr per FD. We don't need any of this code.
v2: Add a big warning comment written by Chris Wilson.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Wed, 29 Mar 2017 00:25:27 +0000 (17:25 -0700)]
i965/drm: Unwrap the unnecessary drm_bacon_reloc_target_info struct.
This used to have another field, but now it's just a BO pointer.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Thu, 23 Mar 2017 00:29:39 +0000 (17:29 -0700)]
i965/drm: Switch from uthash to Mesa's hash table.
No performance data has been gathered about this choice. I just don't
want that many hash tables. Chris points out that this is not
performance critical - we should not be recreating that many handles
from scratch. In the past we used a linear list, which became
unreasonable in stress tests that used hundreds of thousands of BOs.
In real usage, it shouldn't matter that much.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Mon, 3 Apr 2017 07:42:30 +0000 (00:42 -0700)]
i965/drm: Drop bo_gem::kflags.
It's always zero now.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Mon, 3 Apr 2017 06:17:54 +0000 (23:17 -0700)]
i965/drm: Drop has_exec_async related API.
Mesa doesn't use this yet. We'll almost certainly want to, but we can
add the functionality back after we clean up the messy drm code.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Wed, 22 Mar 2017 19:46:08 +0000 (12:46 -0700)]
i965/drm: Drop softpin support for now.
We may want this eventually, but simplify for now. We can add it back
later when we actually intend to use it.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Wed, 22 Mar 2017 19:42:43 +0000 (12:42 -0700)]
i965/drm: Drop userptr support for now.
We'll want userptr support for GL_AMD_pinned_memory support someday,
and possibly some other upload optimizations. Chris says "not in this
form" though. Drop it and simplify for now - we can add it back later
when we're ready to hook it up fully.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Wed, 22 Mar 2017 19:39:44 +0000 (12:39 -0700)]
i965/drm: Delete engine checks.
This is basically handholding to prevent a bogus caller from trying to
execbuffer on a bogus engine. i965 already does this correctly.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 4 Apr 2017 18:45:08 +0000 (11:45 -0700)]
i965/drm: Drop intel_chipset.h in favor of using gen_device_info.
This moves the PCI ID detection to intel_screen.c and makes
drm_bacon_bufmgr_gem_init() take a devinfo pointer.
We also drop the HAS_LLC query stuff - devinfo has that info already,
without kernel queries, and it makes no sense to have two has_llc flags
set by different mechanisms.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Wed, 22 Mar 2017 18:39:39 +0000 (11:39 -0700)]
i965/drm: Drop deprecated drm_bacon_bo::offset.
This field was the wrong size, so we replaced it with offset64.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Wed, 22 Mar 2017 22:28:07 +0000 (15:28 -0700)]
i965/drm: Drop has_wait_timeout.
The wait-ioctl was introduced in kernel v3.6 (
20120930) and that is our
current minimum requirement for screen creation.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Fri, 24 Mar 2017 04:34:23 +0000 (21:34 -0700)]
i965/drm: Assume aperture size query will work.
This query has been available since 2.6.28. We require 3.6.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Wed, 22 Mar 2017 07:50:21 +0000 (00:50 -0700)]
i965/drm: Combine drm_bacon_bufmgr_gem and drm_bacon_bufmgr classes.
The distinction was required when the bufmgr was virtualised, now there
is only one class, we no longer need the distraction of pretending it is
a subclass.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Thu, 30 Mar 2017 21:59:23 +0000 (14:59 -0700)]
i965/drm: Move _drm_bacon_context to intel_bufmgr_gem.c.
This moves us one step closer to killing off intel_bufmgr_priv.h.
We might want to nuke it altogether, since it's basically just a
uint32_t handle, but for now, let's focus on removing files.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 4 Apr 2017 21:51:52 +0000 (14:51 -0700)]
i965/drm: Drop cliprects and dr4 from execbuf variants.
Legacy DRI1 leftovers.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Wed, 22 Mar 2017 00:27:25 +0000 (17:27 -0700)]
i965/drm: Devirtualize the bufmgr.
libdrm_bacon used to have a GEM-based bufmgr and a legacy fake bufmgr,
but that's long since dead (and we never imported it to i965). So,
drop the extra layer of function pointers.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Wed, 22 Mar 2017 05:40:04 +0000 (22:40 -0700)]
i965/drm: Check INTEL_DEBUG & DEBUG_BUFMGR directly.
Eliminates some API around this, and more importantly, the last
field in one bufmgr class.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Wed, 22 Mar 2017 05:42:43 +0000 (22:42 -0700)]
i965/drm: Use Mesa's macros.h instead of duplicating them.
Replace the duplicated macros imported from libdrm:
ARRAY_SIZE, MAX2, ALIGN, STATIC_ASSERT
and remove unused ROUND_UP_TO and ROUND_UP_TO_MB.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Wed, 22 Mar 2017 18:12:27 +0000 (11:12 -0700)]
i965/drm: Use ALIGN, not ROUND_UP_TO.
ROUND_UP_TO handles a NPOT alignment, but all the alignments we use
are power of two anyway, so there's no need.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Wed, 22 Mar 2017 00:31:24 +0000 (17:31 -0700)]
i965/drm: Delete execbuf1 support.
execbuf2 has been around since v2.6.33. We require v3.6.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Thu, 30 Mar 2017 18:46:05 +0000 (11:46 -0700)]
i965/drm: Remove Gen2-3 fence accounting.
Since gen4, we do not use fence registers for any GPU access and so
never have to account for the fence during batch construction. All the
related fence functions are unused.
Based on Kristian Høgsberg's patch; commit message by Chris Wilson.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Fri, 24 Mar 2017 04:19:24 +0000 (21:19 -0700)]
i965/drm: Remove some unused functions and macros.
Mesa doesn't use these functions or macros, so we can delete them,
and save work refactoring and cleaning them up. We'll delete a lot
more later, too.
Based on a patch by Kristian Høgsberg.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 21 Mar 2017 23:10:08 +0000 (16:10 -0700)]
i965/drm: Switch to util/list.h instead of libdrm_lists.h.
Both are kernel style lists, so this is trivial.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 21 Mar 2017 21:54:07 +0000 (14:54 -0700)]
i965/drm: Port to Mesa's atomic header.
Drop xf86atomic.h in favor of Mesa's util/u_atomic.h. We replace the
atomic_t wrapper struct with a bare integer, switch to the 'p_atomic'
naming conventions, and move over the one extra helper.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Mon, 20 Mar 2017 23:42:55 +0000 (16:42 -0700)]
i965/drm: Use our internal libdrm (drm_bacon) rather than the real one.
Now we can actually test our changes.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Thu, 23 Mar 2017 01:51:42 +0000 (18:51 -0700)]
i965/drm: s/drm_intel/drm_bacon/g
Using drm_intel_* as a prefix is hazardous - we don't want to conflict
with the actual libdrm_intel symbols. In particular, I think we could
get into trouble during the final megadrivers linking.
So, rename everything to an different yet arbitrary prefix. bacon and
intel are the same number of characters, so we don't have to reindent
the world. It's also an homage to Ian's "Bacon Trail" platform.
I was going to use "drm_relic" to poke fun at libdrm being ancient,
and so we could explain the name with a "historical reasons" pun,
but it sounds too much like ralloc.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Thu, 23 Mar 2017 22:29:43 +0000 (15:29 -0700)]
i965/drm: Drop libpciaccess dependencies.
i965 doesn't use drm_intel_get_aperture_sizes(), so we can delete
support for it. This avoids a build dependency on libpciaccess.
Chris also notes:
"There's a really old bug that hopefully has been closed already
(although as far as I can tell, it has never been fixed) about
how using libpciaccess from libdrm_intel breaks the world (since
libpciaccess uses a singleton that is torn down at the first request
rather than upon the last user)."
This bug should go away in two commits when we switch over to our
internal copy of libdrm_intel.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84325
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 21 Mar 2017 22:55:29 +0000 (15:55 -0700)]
i965/drm: Make libdrm_lists.h compile by defining typeof.
typeof doesn't seem to exist, so this won't compile (but we don't yet
try). Define it to __typeof__. This code is going to die soon anyway.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Fri, 31 Mar 2017 17:06:24 +0000 (10:06 -0700)]
i965/drm: remove legacy defines, aub functions, and decoder prototypes
We never imported any of this code, so drop the prototypes, unused
enums, and defines.
Based on patches by Emil Velikov.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>