This happens especially with exports and varying packing, where the last
bits aren't always filled in. We end up trying to do quad-wide stores,
which ends up being a lot of register moves that carefully preserve the
nop value. Instead don't do the stores.
total instructions in shared programs :
6131375 ->
6125267 (-0.10%)
total gprs used in shared programs : 910139 -> 895501 (-1.61%)
total local used in shared programs : 15328 -> 15328 (0.00%)
local gpr inst
helped 0 7442 4693
hurt 0 90 2687
Most of the helped/hurt instruction changes are by one or two ops
because can no longer do quad-wide stores in all cases.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
}
} else
if (ldst->op == OP_STORE || ldst->op == OP_EXPORT) {
+ if (typeSizeof(ldst->dType) == 4 &&
+ ldst->src(1).getFile() == FILE_GPR &&
+ ldst->getSrc(1)->getInsn()->op == OP_NOP) {
+ delete_Instruction(prog, ldst);
+ continue;
+ }
isLoad = false;
} else {
// TODO: maybe have all fixed ops act as barrier ?