From: Francisco Jerez Date: Wed, 19 Feb 2020 04:48:23 +0000 (-0800) Subject: iris: Introduce cache coherency matrix for batch-local memory ordering. X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=fc221875cf1fe546e0087aeef55ca976647ef9c2;p=mesa.git iris: Introduce cache coherency matrix for batch-local memory ordering. This introduces a representation of the cache coherency status of the GPU at any point in the batch. This is done by defining a matrix C of synchronization sequence numbers such that at any point of batch construction, a memory operation from domain i introduced into the batch is guaranteed to be ordered after any memory operation from domain j in a previous batch section with seqno n if the following condition holds: C_i_j >= n This allows us to efficiently determine whether additional flushing and/or invalidation is required in order to access a buffer object from some arbitrary domain. Except for batch buffer reset which requires clearing the whole matrix, all operations on the matrix are either O(n) or O(1) on the number of caching domains (which is basically constant). Reviewed-by: Kenneth Graunke Part-of: --- diff --git a/src/gallium/drivers/iris/iris_batch.c b/src/gallium/drivers/iris/iris_batch.c index a3455933471..a6657a373f5 100644 --- a/src/gallium/drivers/iris/iris_batch.c +++ b/src/gallium/drivers/iris/iris_batch.c @@ -405,6 +405,7 @@ iris_batch_reset(struct iris_batch *batch) assert(!batch->sync_region_depth); iris_batch_sync_boundary(batch); + iris_batch_mark_reset_sync(batch); /* Always add the workaround BO, it contains a driver identifier at the * beginning quite helpful to debug error states. diff --git a/src/gallium/drivers/iris/iris_batch.h b/src/gallium/drivers/iris/iris_batch.h index 00f62f2fb6f..a056730130c 100644 --- a/src/gallium/drivers/iris/iris_batch.h +++ b/src/gallium/drivers/iris/iris_batch.h @@ -144,6 +144,18 @@ struct iris_batch { struct gen_batch_decode_ctx decoder; struct hash_table_u64 *state_sizes; + /** + * Matrix representation of the cache coherency status of the GPU at the + * current end point of the batch. For every i and j, + * coherent_seqnos[i][j] denotes the seqno of the most recent flush of + * cache domain j visible to cache domain i (which obviously implies that + * coherent_seqnos[i][i] is the most recent flush of cache domain i). This + * can be used to efficiently determine whether synchronization is + * necessary before accessing data from cache domain i if it was previously + * accessed from another cache domain j. + */ + uint64_t coherent_seqnos[NUM_IRIS_DOMAINS][NUM_IRIS_DOMAINS]; + /** * Sequence number used to track the completion of any subsequent memory * operations in the batch until the next sync boundary. @@ -313,4 +325,41 @@ iris_batch_sync_boundary(struct iris_batch *batch) } } +/** + * Update the cache coherency status of the batch to reflect a flush of the + * specified caching domain. + */ +static inline void +iris_batch_mark_flush_sync(struct iris_batch *batch, + enum iris_domain access) +{ + batch->coherent_seqnos[access][access] = batch->next_seqno - 1; +} + +/** + * Update the cache coherency status of the batch to reflect an invalidation + * of the specified caching domain. All prior flushes of other caches will be + * considered visible to the specified caching domain. + */ +static inline void +iris_batch_mark_invalidate_sync(struct iris_batch *batch, + enum iris_domain access) +{ + for (unsigned i = 0; i < NUM_IRIS_DOMAINS; i++) + batch->coherent_seqnos[access][i] = batch->coherent_seqnos[i][i]; +} + +/** + * Update the cache coherency status of the batch to reflect a reset. All + * previously accessed data can be considered visible to every caching domain + * thanks to the kernel's heavyweight flushing at batch buffer boundaries. + */ +static inline void +iris_batch_mark_reset_sync(struct iris_batch *batch) +{ + for (unsigned i = 0; i < NUM_IRIS_DOMAINS; i++) + for (unsigned j = 0; j < NUM_IRIS_DOMAINS; j++) + batch->coherent_seqnos[i][j] = batch->next_seqno - 1; +} + #endif