5 gate count for 4:1 MUX is estimated at 40 gates. 8x 40 gates for a total of 320 gates to provide aligned byteswapping 64 bit.
7 <img alt="Byte-swap mux diagram" src="../byteswap_mux.svg" width="100%"/>
11 per source and dest operand, 320 gates required. if two source operands (ALU pipeline, Logical pipeline) this totals 960 gates (per Function Unit). 3 operands (madd) is 1280 gates. for an estimated 50 Function Units this totals around 50,000 gates.
13 <img alt="Byte-swap pipe diagram" src="../byteswap_pipe.svg" width="100%"/>