uart: Eliminate systemic baud rate error with low divisor values
This refactors the receiver logic to compensate for the case of the baud
rate divisor not being multiple of the oversampling period.
Previously, the bit time was effectively rounded to (s * floor(div / s))
cycles, where "s" is the oversampling factor - the number of
intermediate samples for each data bit. The remainder r = (div % s) was
ignored, thereby resulting in gradually accumulated drift that became
significant for divisor values on the same order of magnitude as "s".
The revised approach inserts the required additional delay by extending
the last "r" samples of a given data bit by one cycle each.