aco: implement 64-bit integer reductions
The multiplication reduction is larger than it could be, but it should be
easier to implement this way.
No failures with dEQP-VK.subgroups.*int64* except those caused by LLVM
being used for other stages.
v2: don't call setFixed() for v_add carry-out, since setHint sets physReg
v3: add and use emit_vadd32() helper
v4: use num_opcodes instead of last_opcode
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v3)