Fix matmul performance on gcc 4.9