countzero: Faster algorithm for count leading/trailing zeroes
This uses an algorithm for count leading/trailing zeroes that is
faster on FPGAs, which makes timing easier. cntlz* and cnttz*
still take two cycles, though.
For count trailing zeroes, we compute x & -x, which for non-zero x
has a single 1 bit in the position of the least-significant 1 bit
in x. This one-hot representation can then be converted to a bit
number with six 32-input OR gates. For count leading zeroes, we
simply do a bit-reversal on x and then use the same algorithm.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>