the most efficient division algorithm is probably Knuth's Algorithm D (with modifications from the exercises section of his book) which is O(n^2) and uses 2N-by-N-bit div/rem
an oversimplified version of the knuth algorithm d with 32-bit words is:
-(TODO find original)
+(TODO find original: <https://raw.githubusercontent.com/hcs0/Hackers-Delight/master/divmnu64.c.txt>
```
void div(uint32_t *n, uint32_t *d, uint32_t* q, int n_bytes, int d_bytes) {