radeonsi: add TGSI_SEMANTIC_CS_USER_DATA for reading up to 4 SGPRs with TGSI