radeonsi: Use buffer loads and stores for passing data from TCS to TES.
We always try to use 4-component loads, as LLVM does not combine loads
and they bypass the L1 cache.
We can't use a similar strategy for stores and this is especially
notable with the tess factors, as they are often set with separate
MOV's per component in the TGSI.
We keep storing to LDS and the LDS space, so we can load the outputs
later, either due to the shader, of for wrting the tess factors.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>