countzero: Reorganize to have fewer levels of logic and fewer LUTs
By using 4:1 multiplexers rather than 2:1, this cuts the number of
levels of multiplexing from 4 to 2 and also reduces the total number
of slice LUTs required. Because we are now handling 4 bits at each
level, including the bottom level, the logic to do the priority
encoding can be factored out into a function that is used at each
level.
This rearranges the logic so that the encoding and selection of bits
is done whether or not the input operand is zero, and the if statement
testing whether the input is zero only affects what is assigned to
result. With this we don't get the inferred latches and we can go
back to using signals rather than variables.
Also add some comments about what is being done.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>