From: Kenneth Graunke Date: Wed, 25 Jan 2017 08:59:42 +0000 (-0800) Subject: i965: Unbind deleted shaders from brw_context, fixing malloc heisenbug. X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=7c5629a26912af9164bda17b7af5370f6b4302e4;p=mesa.git i965: Unbind deleted shaders from brw_context, fixing malloc heisenbug. Applications may delete a shader program, create a new one, and bind it before the next draw. With terrible luck, malloc may randomly return a chunk of memory for the new gl_program that happened to be the exact same pointer as our previously bound gl_program. In this case, our logic to detect new programs in brw_upload_pipeline_state() would break: if (brw->vertex_program != ctx->VertexProgram._Current) { brw->vertex_program = ctx->VertexProgram._Current; brw->ctx.NewDriverState |= BRW_NEW_VERTEX_PROGRAM; } Because the pointer is the same, we'd think it was the same program. But it could be wildly different - a different stage altogether, different sets of resources, and so on. This causes utter chaos. As unlikely as this seems, I believe I hit this when running a subset of the CTS in a loop, in a group of tests that churns through simple programs, deleting and rebuilding them. Presumably malloc uses a bucketing cache of sorts, and so freeing up a gl_program and allocating a new one fairly quickly causes it to reuse that memory. The result was that brw->vertex_program->info.num_ssbos claimed the program had SSBOs, while brw->vs.base.prog_data.binding_table claimed that there were none. This was crazy, because the binding table is calculated from info.num_ssbos - the shader info appeared to change between shader compile time and draw time. Careful use of watchpoints revealed that it was being clobbered by rzalloc's memset when building an entirely different program... Fortunately, our 0xd0d0d0d0 canary for unused binding table entries caused us to crash out of bounds when trying to upload SSBOs, or we may have never discovered this heisenbug. Fixes crashes in GL45-CTS.compute_shader.sso-case2 when using a hacked cts-runner that only runs GL45-CTS.compute_shader.s* in EGL config ID 5 at 64x64 in a loop with 100 iterations. Cc: "17.0 13.0 12.0" Signed-off-by: Kenneth Graunke Reviewed-by: Timothy Arceri Reviewed-by: Topi Pohjolainen Reviewed-by: Jason Ekstrand --- diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c index e81f6b15c0a..673dc502ad4 100644 --- a/src/mesa/drivers/dri/i965/brw_program.c +++ b/src/mesa/drivers/dri/i965/brw_program.c @@ -177,6 +177,49 @@ static struct gl_program *brwNewProgram(struct gl_context *ctx, GLenum target, static void brwDeleteProgram( struct gl_context *ctx, struct gl_program *prog ) { + struct brw_context *brw = brw_context(ctx); + + /* Beware! prog's refcount has reached zero, and it's about to be freed. + * + * In brw_upload_pipeline_state(), we compare brw->foo_program to + * ctx->FooProgram._Current, and flag BRW_NEW_FOO_PROGRAM if the + * pointer has changed. + * + * We cannot leave brw->foo_program as a dangling pointer to the dead + * program. malloc() may allocate the same memory for a new gl_program, + * causing us to see matching pointers...but totally different programs. + * + * We cannot set brw->foo_program to NULL, either. If we've deleted the + * active program, Mesa may set ctx->FooProgram._Current to NULL. That + * would cause us to see matching pointers (NULL == NULL), and fail to + * detect that a program has changed since our last draw. + * + * So, set it to a bogus gl_program pointer that will never match, + * causing us to properly reevaluate the state on our next draw. + * + * Getting this wrong causes heisenbugs which are very hard to catch, + * as you need a very specific allocation pattern to hit the problem. + */ + static const struct gl_program deleted_program; + + if (brw->vertex_program == prog) + brw->vertex_program = &deleted_program; + + if (brw->tess_ctrl_program == prog) + brw->tess_ctrl_program = &deleted_program; + + if (brw->tess_eval_program == prog) + brw->tess_eval_program = &deleted_program; + + if (brw->geometry_program == prog) + brw->geometry_program = &deleted_program; + + if (brw->fragment_program == prog) + brw->fragment_program = &deleted_program; + + if (brw->compute_program == prog) + brw->compute_program = &deleted_program; + _mesa_delete_program( ctx, prog ); }