intel/nir: Lower load_num_work_groups to 32-bit if needed