Some hardware can do 24b multiply in a single instruction, but not 32b.
However in most cases 24b is sufficient for address/offset calculation.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
triop("imad24_ir3", tint32, _2src_commutative,
"(((int32_t)src0 << 8) >> 8) * (((int32_t)src1 << 8) >> 8) + src2")
+# 24b multiply into 32b result (with sign extension)
+binop("imul24", tint32, _2src_commutative + associative,
+ "(((int32_t)src0 << 8) >> 8) * (((int32_t)src1 << 8) >> 8)")