radeonsi: split input upload off from si_launch_grid
Also uses a dynamically allocated buffer using u_upload_alloc.
The old buffer per program approach required serializing all
dispatches of the same program.
v2: - Clarified commit message.
- Use radeon_set_sh_reg_seq.
- Also upload input buffer for clover kernels, even when
input_size is 0, as it contains grid parameters.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>