allows the instruction to perform full parallel vector div/mod,
or act in loop-back mode for big-int division by a scalar,
or for a single scalar 128/64 div/mod.
+
+Just as with `divdeu` on which this instruction is based an overflow
+detection is required. When the divisor is too small compared to
+the dividend then the result may not fit into 64 bit. Knuth's
+original algorithm detects overflow and manually places 0xffffffff
+(all ones) into `qhat`. It makes sense for `divrem2du` to do
+this, and also to return an overflow indicator: this can be dobe
+by always setting Rc=1 as a way to save opcode space