PR target/95294: VAX: Convert backend to MODE_CC representation
In the VAX ISA INSV bit-field insert instruction is the only computational
operation that keeps the condition codes, held in the PSL or Processor
Status Longword register, intact. The instruction is flexible enough it
could potentially be used for data moves post-reload, but then reportedly
it is not the best choice performance-wise, and then we have no addition
operation available that would keep the condition codes unchanged.
Futhermore, as usually with a complex CISC ISA, for many operations we
have several machine instructions or instruction sequences to choose
from that set condition codes in a different manner.
Use the approach then where the condition codes only get introduced by
reload, by definining instruction splitters for RTL insns that change
condition codes in some way, by default considering them clobbered.
Then to prevent code generated from regressing too much provide insns
that include a `compare' operation setting the condition codes in
parallel to the main operation. The manner condition codes are set by
each insn is supposed to be provided by the whatever the SELECT_CC_MODE
macro expands to.
Given that individual patterns provided for the same RTL basic operation
may set the condion codes differently keeping the information away from
the insn patterns themselves would cause a maintenance nightmare and
would be bound to fail in a horrible way sooner or later. Therefore
instead let the patterns themselves choose which condition modes they
support, by having one or more subst iterators applied and then have
individual comparison operators require the specific condition mode each
according to the codes used by the operation.
While subst iterators only support one alternative each, there is
actually no problem with applying multiple ones to a single insn with
the result as intended, and if the corresponding subst attribute
supplies an empty NO-SUBST-VALUE, then no mess results even. Make use
of this observation.
Add appropriate subst iterators to all the computational patterns then,
according to the condition codes they usably set, including DImode ones
and a substitute DImode comparison instruction in the absence of a CMPQ
machine instruction, however do not provide a `cbranchdi4' named pattern
as without a further development it regresses code quality by resorting
to the `__cmpdi2' libcall where a simpler operation would do, e.g. to
check for negativity the TSTL machine instruction may be executed over
the upper longword only. This is good material for further work.
Do not apply subst iterators to the increment- or decrement-and-branch
patterns at this time; these may yet have to be reviewed, in particular
whether `*jsobneq_minus_one' is still relevant in the context of the
recent integer constant cost review.
Also add a couple of peepholes to help eliminating comparisons in some
problematic cases, such as with the BIT instruction which is bitwise-AND
for condition codes only that has no direct counterpart for the actual
calculation, because the BIC instruction which does do bitwise-AND and
produces a result implements the operation with a bitwise negation of
its input `mask' operand. Or the FFS instruction which sets the Z
condition code according to its `field' input operand rather than the
result produced. Or the bit-field comparisons we don't have generic
middle-end support for.
Code size stats are as follows, obtained from 17640 and 9086 executables
built in `check-c' and `check-c++' GCC testing respectively:
check-c check-c++
samples average median samples average median
---------------------------------------------------------------
regressions 1813 0.578% 0.198% 289 0.349% 0.175%
unchanged 15160 0.000% 0.000% 8662 0.000% 0.000%
progressions 667 -0.589% -0.194% 135 -0.944% -0.191%
----------------------------------------------------------------
total 17640 0.037% 0.000% 9086 -0.003% 0.000%
Outliers:
old new change %change filename
----------------------------------------------------
2406 2950 +544 +22.610
20111208-1.exe
4314 5329 +1015 +23.528 pr39417.exe
2235 3055 +820 +36.689 990404-1.exe
2631 4213 +1582 +60.129 pr57521.exe
3063 5579 +2516 +82.142
20000422-1.exe
and:
old new change %change filename
----------------------------------------------------
6317 4845 -1472 -23.302 vector-compare-1.exe
6313 4845 -1468 -23.254 vector-compare-1.exe
6474 5002 -1472 -22.737 vector-compare-1.exe
6470 5002 -1468 -22.689 vector-compare-1.exe
We have some code quality regressions like:
10861: 9e ef d9 12 movab 11b40 <p>,r0
10865: 00 00 50
10868: 90 a0 03 a0 movb 0x3(r0),0x2(r0)
1086c: 02
1086d: d1 60 8f 61 cmpl (r0),$0x64646261
10871: 62 64 64
10874: 13 07 beql 1087d <main_test+0x21>
to:
10861: 9e ef e1 12 movab 11b48 <p>,r0
10865: 00 00 50
10868: 90 a0 03 a0 movb 0x3(r0),0x2(r0)
1086c: 02
1086d: d1 ef d5 12 cmpl 11b48 <p>,$0x64646261
10871: 00 00 8f 61
10875: 62 64 64
10878: 13 07 beql 10881 <main_test+0x25>
(from `memmove-2.x2') due to the constant propagation passes eagerly
replacing pseudo registers with direct symbol references where possible,
which does not happen with CC0 even though the passes do run regardless.
There are further code quality regressions due to earlier compilation
stages trying to push expression evaluation earlier where possible so
as to make data dependencies further apart from each other. This works
well for computations and architectures that do not involve condition
codes set as a side effect of calculations. However for integer
negation that makes assembly code produced like:
movb *8(%ap),%r0
mnegb %r0,%r1
tstb %r0
jeql .L2
the RTL equibvalent of which the comparison elimination pass cannot
really do anything about, because the comparison is made on the source
rather than the target operand of the negation (we could add a peephole
for this, but this seems futile an effort, as one'd have to iterate over
all the possible such cases), even though this is really equivalent to:
movb *8(%ap),%r0
mnegb %r0,%r1
jeql .L2
or, if R0 is dead at the conclusion of the branch, even:
mnegb *8(%ap),%r1
jeql .L2
Since the compiler insists on doing the comparison on the source of the
negation it obviously has to load it into a temporary so as to avoid
accessing the original memory location twice, hence the sequence of
three instructions rather than just a single one. A similar phenomenon
can be observed with the XOR operation and in other cases.
In some cases a comparison does get eliminated, however useless moves
into registers done in preparation to it remain, such as with:
movb *8(%ap),%r2
movb *12(%ap),%r1
subb3 %r1,%r2,%r0
jlssu .L2
where R1 and R2 are both dead at conclusion and therefore:
subb3 *12(%ap),*8(%ap),%r0
jlssu .L2
would obviously do, but there was to be a comparison before the branch:
cmpb %r2,%r1
All this looks like material for future improvement.
Test cases for comparison elimination and the peepholes will be supplied
separately.
gcc/
PR target/95294
* config/vax/elf.h (REGISTER_NAMES): Append `%psl'.
* config/vax/vax-modes.def (CCN, CCNZ, CCZ): New modes.
* config/vax/vax-protos.h (vax_select_cc_mode): New prototype.
(vax_maybe_split_dimode_move): Likewise.
(vax_notice_update_cc): Remove prototype.
* config/vax/vax.c (TARGET_FLAGS_REGNUM): New macro.
(TARGET_CC_MODES_COMPATIBLE): Likewise.
(TARGET_MD_ASM_ADJUST): Likewise.
(vax_select_cc_mode): New function
(vax_cc_modes_compatible): Likewise.
(vax_md_asm_adjust): Likewise.
(vax_notice_update_cc): Remove function.
(vax_output_int_move): Factor out code checking if a DImode move
may have to be split...
(vax_maybe_split_dimode_move): ... into this new function.
* config/vax/vax.h (FIRST_PSEUDO_REGISTER): Bump up.
(FIXED_REGISTERS): Append an entry for PSL.
(CALL_USED_REGISTERS): Likewise.
(NOTICE_UPDATE_CC, OUTPUT_JUMP): Remove macros.
(SELECT_CC_MODE): New macro.
(REGISTER_NAMES): Append `psl'.
* config/vax/predicates.md (const_zero_operand)
(vax_cc_comparison_operator, vax_ccn_comparison_operator)
(vax_ccnz_comparison_operator, vax_ccz_comparison_operator):
New predicates.
* config/vax/builtins.md: Rewrite for MODE_CC representation.
* config/vax/vax.md: Likewise.