rs6000: Add MMA built-in function definitions and test cases.
Add the Matrix-Multiply Assist (MMA) built-ins. The MMA accumulators are
INOUT operands for most MMA instructions, but they are also very expensive
to move around. For this reason, we have implemented a built-in API where
the accumulators are passed using pass-by-reference/pointers, so the user
won't use one accumulator as input and another as output, which wouldentail
a lot of copies. However, using pointers gives us poor code generation
when we expand the built-ins at normal expand time. We therefore expand
the MMA built-ins early into gimple, converting the pass-by-reference calls
to an internal built-in that uses pass-by-value calling convention, where
we can enforce the input and output accumulators are the same. This gives
us much better code generation.
2020-06-20 Peter Bergner <bergner@linux.ibm.com>
gcc/
* config/rs6000/predicates.md (mma_assemble_input_operand): New.
* config/rs6000/rs6000-builtin.def (BU_MMA_1, BU_MMA_V2, BU_MMA_3,
BU_MMA_5, BU_MMA_6, BU_VSX_1): Add support macros for defining MMA
built-in functions.
(ASSEMBLE_ACC, ASSEMBLE_PAIR, DISASSEMBLE_ACC, DISASSEMBLE_PAIR,
PMXVBF16GER2, PMXVBF16GER2NN, PMXVBF16GER2NP, PMXVBF16GER2PN,
PMXVBF16GER2PP, PMXVF16GER2, PMXVF16GER2NN, PMXVF16GER2NP,
PMXVF16GER2PN, PMXVF16GER2PP, PMXVF32GER, PMXVF32GERNN,
PMXVF32GERNP, PMXVF32GERPN, PMXVF32GERPP, PMXVF64GER, PMXVF64GERNN,
PMXVF64GERNP, PMXVF64GERPN, PMXVF64GERPP, PMXVI16GER2, PMXVI16GER2PP,
PMXVI16GER2S, PMXVI16GER2SPP, PMXVI4GER8, PMXVI4GER8PP, PMXVI8GER4,
PMXVI8GER4PP, PMXVI8GER4SPP, XVBF16GER2, XVBF16GER2NN, XVBF16GER2NP,
XVBF16GER2PN, XVBF16GER2PP, XVCVBF16SP, XVCVSPBF16, XVF16GER2,
XVF16GER2NN, XVF16GER2NP, XVF16GER2PN, XVF16GER2PP, XVF32GER,
XVF32GERNN, XVF32GERNP, XVF32GERPN, XVF32GERPP, XVF64GER, XVF64GERNN,
XVF64GERNP, XVF64GERPN, XVF64GERPP, XVI16GER2, XVI16GER2PP, XVI16GER2S,
XVI16GER2SPP, XVI4GER8, XVI4GER8PP, XVI8GER4, XVI8GER4PP, XVI8GER4SPP,
XXMFACC, XXMTACC, XXSETACCZ): Add MMA built-ins.
* config/rs6000/rs6000.c (rs6000_emit_move): Use CONST_INT_P.
Allow zero constants.
(print_operand) <case 'A'>: New output modifier.
(rs6000_split_multireg_move): Add support for inserting accumulator
priming and depriming instructions. Add support for splitting an
assemble accumulator pattern.
* config/rs6000/rs6000-call.c (mma_init_builtins, mma_expand_builtin,
rs6000_gimple_fold_mma_builtin): New functions.
(RS6000_BUILTIN_M): New macro.
(def_builtin): Handle RS6000_BTC_QUAD and RS6000_BTC_PAIR attributes.
(bdesc_mma): Add new MMA built-in support.
(htm_expand_builtin): Use RS6000_BTC_OPND_MASK.
(rs6000_invalid_builtin): Add handling of RS6000_BTM_FUTURE and
RS6000_BTM_MMA.
(rs6000_builtin_valid_without_lhs): Handle RS6000_BTC_VOID attribute.
(rs6000_gimple_fold_builtin): Call rs6000_builtin_is_supported_p
and rs6000_gimple_fold_mma_builtin.
(rs6000_expand_builtin): Call mma_expand_builtin.
Use RS6000_BTC_OPND_MASK.
(rs6000_init_builtins): Adjust comment. Call mma_init_builtins.
(htm_init_builtins): Use RS6000_BTC_OPND_MASK.
(builtin_function_type): Handle VSX_BUILTIN_XVCVSPBF16 and
VSX_BUILTIN_XVCVBF16SP.
* config/rs6000/rs6000.h (RS6000_BTC_QUINARY, RS6000_BTC_SENARY,
RS6000_BTC_OPND_MASK, RS6000_BTC_QUAD, RS6000_BTC_PAIR,
RS6000_BTC_QUADPAIR, RS6000_BTC_GIMPLE): New defines.
(RS6000_BTC_PREDICATE, RS6000_BTC_ABS, RS6000_BTC_DST,
RS6000_BTC_TYPE_MASK, RS6000_BTC_ATTR_MASK): Adjust values.
* config/rs6000/mma.md (MAX_MMA_OPERANDS): New define_constant.
(UNSPEC_MMA_ASSEMBLE_ACC, UNSPEC_MMA_PMXVBF16GER2,
UNSPEC_MMA_PMXVBF16GER2NN, UNSPEC_MMA_PMXVBF16GER2NP,
UNSPEC_MMA_PMXVBF16GER2PN, UNSPEC_MMA_PMXVBF16GER2PP,
UNSPEC_MMA_PMXVF16GER2, UNSPEC_MMA_PMXVF16GER2NN,
UNSPEC_MMA_PMXVF16GER2NP, UNSPEC_MMA_PMXVF16GER2PN,
UNSPEC_MMA_PMXVF16GER2PP, UNSPEC_MMA_PMXVF32GER,
UNSPEC_MMA_PMXVF32GERNN, UNSPEC_MMA_PMXVF32GERNP,
UNSPEC_MMA_PMXVF32GERPN, UNSPEC_MMA_PMXVF32GERPP,
UNSPEC_MMA_PMXVF64GER, UNSPEC_MMA_PMXVF64GERNN,
UNSPEC_MMA_PMXVF64GERNP, UNSPEC_MMA_PMXVF64GERPN,
UNSPEC_MMA_PMXVF64GERPP, UNSPEC_MMA_PMXVI16GER2,
UNSPEC_MMA_PMXVI16GER2PP, UNSPEC_MMA_PMXVI16GER2S,
UNSPEC_MMA_PMXVI16GER2SPP, UNSPEC_MMA_PMXVI4GER8,
UNSPEC_MMA_PMXVI4GER8PP, UNSPEC_MMA_PMXVI8GER4,
UNSPEC_MMA_PMXVI8GER4PP, UNSPEC_MMA_PMXVI8GER4SPP,
UNSPEC_MMA_XVBF16GER2, UNSPEC_MMA_XVBF16GER2NN,
UNSPEC_MMA_XVBF16GER2NP, UNSPEC_MMA_XVBF16GER2PN,
UNSPEC_MMA_XVBF16GER2PP, UNSPEC_MMA_XVF16GER2, UNSPEC_MMA_XVF16GER2NN,
UNSPEC_MMA_XVF16GER2NP, UNSPEC_MMA_XVF16GER2PN, UNSPEC_MMA_XVF16GER2PP,
UNSPEC_MMA_XVF32GER, UNSPEC_MMA_XVF32GERNN, UNSPEC_MMA_XVF32GERNP,
UNSPEC_MMA_XVF32GERPN, UNSPEC_MMA_XVF32GERPP, UNSPEC_MMA_XVF64GER,
UNSPEC_MMA_XVF64GERNN, UNSPEC_MMA_XVF64GERNP, UNSPEC_MMA_XVF64GERPN,
UNSPEC_MMA_XVF64GERPP, UNSPEC_MMA_XVI16GER2, UNSPEC_MMA_XVI16GER2PP,
UNSPEC_MMA_XVI16GER2S, UNSPEC_MMA_XVI16GER2SPP, UNSPEC_MMA_XVI4GER8,
UNSPEC_MMA_XVI4GER8PP, UNSPEC_MMA_XVI8GER4, UNSPEC_MMA_XVI8GER4PP,
UNSPEC_MMA_XVI8GER4SPP, UNSPEC_MMA_XXMFACC, UNSPEC_MMA_XXMTACC): New.
(MMA_ACC, MMA_VV, MMA_AVV, MMA_PV, MMA_APV, MMA_VVI4I4I8,
MMA_AVVI4I4I8, MMA_VVI4I4I2, MMA_AVVI4I4I2, MMA_VVI4I4,
MMA_AVVI4I4, MMA_PVI4I2, MMA_APVI4I2, MMA_VVI4I4I4,
MMA_AVVI4I4I4): New define_int_iterator.
(acc, vv, avv, pv, apv, vvi4i4i8, avvi4i4i8, vvi4i4i2,
avvi4i4i2, vvi4i4, avvi4i4, pvi4i2, apvi4i2, vvi4i4i4,
avvi4i4i4): New define_int_attr.
(*movpxi): Add zero constant alternative.
(mma_assemble_pair, mma_assemble_acc): New define_expand.
(*mma_assemble_acc): New define_insn_and_split.
(mma_<acc>, mma_xxsetaccz, mma_<vv>, mma_<avv>, mma_<pv>, mma_<apv>,
mma_<vvi4i4i8>, mma_<avvi4i4i8>, mma_<vvi4i4i2>, mma_<avvi4i4i2>,
mma_<vvi4i4>, mma_<avvi4i4>, mma_<pvi4i2>, mma_<apvi4i2>,
mma_<vvi4i4i4>, mma_<avvi4i4i4>): New define_insn.
* config/rs6000/rs6000.md (define_attr "type"): New type mma.
* config/rs6000/vsx.md (UNSPEC_VSX_XVCVBF16SP): New.
(UNSPEC_VSX_XVCVSPBF16): Likewise.
(XVCVBF16): New define_int_iterator.
(xvcvbf16): New define_int_attr.
(vsx_<xvcvbf16>): New define_insn.
* doc/extend.texi: Document the mma built-ins.
15 files changed: