# PartitionedSignal nmigen-aware eq (assign)

* <https://bugs.libre-soc.org/show_bug.cgi?id=709>

For copying (assigning) PartitionedSignal to PartitionedSignal
of equal size there is no issue.  However if the source has a
greater width than the target, *partition-aware* truncation
must occur.  For the opposite, sign/zero extension must occur.
Finally for a Signal or Const, duplication across all Partitions
must occur, again, following the rules of zero, sign or unsigned.

Take two PartitionedSignals (source a, dest b) of 32 bit:

    partition:      p    p    p       (3 bits)
    a        :  AAA3 AAA2 AAA1 AAA0  (32 bits)
    b        :  BBB3 BBB2 BBB1 BBB0  (32 bits)

For all partition settings this copies verbatim.
However when A is either shorter or longer, different
behaviour occurs.
If A
is shorter than B:

    partition:      p    p    p       (3 bits)
    a        :  A7A6 A5A4 A3A2 A1A0  (8 bits)
    b        :  BBB3 BBB2 BBB1 BBB0  (16 bits)

then it matters what the partition settings are:

| partition | o3         | o2         | o1         | o0         |
| --------- | --         | --         | --         | --         |
| 000       | [A7A7A7A7] | [A7A7A7A7] | A7A6A5A4   | A3A2A1A0   |
| 001       | [A7A7A7A7] | [A7A7]A7A6 | A5A4A3A2   | [A1A1]A1A0 |
| 010       | [A7A7A7A7] | A7A6A5A4   | [A3A3A3A3] | A3A2A1A0   |
| 011       | [A7A7A7A7] | A7A6A5A4   | [A3A3]A3A2 | [A1A1]A1A0 |
| 100       | [A7A7]A7A6 | [A5A5A5A5] | [A5A5]A5A4 | A3A2A1A0   |
| 101       | [A7A7]A7A6 | [A5A5A5A5] | A5A4A3A2   | [A1A1]A1A0 |
| 110       | [A7A7]A7A6 | [A5A5]A5A4 | [A3A3A3A3] | A3A2A1A0   |
| 111       | [A7A7]A7A6 | [A5A5]A5A4 | [A3A3]A3A2 | [A1A1]A1A0 |

where square brackets are zero if A is unsigned, and contains
the specified bits if signed.  Here, each partition copies the
smaller value (A) into the larger partition (B) then, depending
on whether A is signed or unsigned, sign-extends or zero-extends
*on a per-partition basis*.

For A longer than B:

    partition:      p    p    p       (3 bits)
    a        :  AAAA AAAA AAAA AAAA  (16 bits)
    b        :  B7B6 B5B4 B3B2 B1B0  (8 bits)

truncation occurs at different points depending on partitions:

| partition | o3     | o2   | o1   | o0     |
| --------- | --     | --   | --   | --     |
| 000       | A7A6   | A5A4 | A3A2 | A1A0   |
| 001       | A9A8   | A7A6 | A5A4 | A1A0   |
| 010       | A11A10 | A9A8 | A3A2 | A1A0   |
| 011       | A11A10 | A9A8 | A5A4 | A1A0   |
| 100       | A13A12 | A5A4 | A3A2 | A1A0   |
| 101       | A13A12 | A7A6 | A5A4 | A1A0   |
| 110       | A13A12 | A9A8 | A3A2 | A1A0   |
| 111       | A13A12 | A9A8 | A5A4 | A1A0   |

In effect, copying starts from the beginning of a partition,
ending when a closed partition point is found.

# Scalar source

When the source A is scalar and is equal or larger than
the destination it requires copying across multiple
partitions:

    partition:      p    p    p       (3 bits)
    a        :  AAAA AAAA AAAA AAAA  (16 bits)
    b        :  B7B6 B5B4 B3B2 B1B0  (8 bits)

As the source is Scalar, it must be copied (broadcast) into each
partition of the output, B, starting at the beginning of each
partition. With each partition being smaller than A (except
in one case) truncation is guaranteed. The exception is when
all pattitions are open (1x) and the length of A and B are
the same.

The partition options are:

| partition | o3   | o2   | o1   | o0     |
| --------- | --   | --   | --   | --     |
| 000       | A7A6 | A5A4 | A3A2 | A1A0   |
| 001       | A5A4 | A3A2 | A1A0 | A1A0   |
| 010       | A3A2 | A1A0 | A3A2 | A1A0   |
| 011       | A3A2 | A1A0 | A1A0 | A1A0   |
| 100       | A1A0 | A5A4 | A3A2 | A1A0   |
| 101       | A1A0 | A3A2 | A1A0 | A1A0   |
| 110       | A1A0 | A1A0 | A3A2 | A1A0   |
| 111       | A1A0 | A1A0 | A1A0 | A1A0   |

When the partitions are all open (1x) only the bits that will fit across
the whole of the target are copied.  In this example, B is 8 bits so only
8 bits of A are copied.

When the partitions are all closed (4x SIMD) each partition of B is
2 bits wide, therefore only the *first two* bits of A are copied into
*each* of the four 2-bit partitions in B.

For the case where A is shorter than B output, sign or zero
extension is required. Here we assume A is 8 bits, B is 16.
This is similar to the parallel case except A is repeated
(broadcast) across all of B.


| partition | o3         | o2         | o1         | o0         |
| --------- | --         | --         | --         | --         |
| 000       | [A7A7A7A7] | [A7A7A7A7] | A7A6A5A4   | A3A2A1A0   |
| 001       | [A7A7A7A7] | A7A6A5A4   | A3A2A1A0   | A3A2A1A0   |
| 010       | A7A6A5A4   | A3A2A1A0   | A7A6A5A4   | A3A2A1A0   |
| 011       | A7A6A5A4   | A3A2A1A0   | A3A2A1A0   | A3A2A1A0   |
| 100       | A3A2A1A0   | [A7A7A7A7] | A7A6A5A4   | A3A2A1A0   |
| 101       | A3A2A1A0   | A7A6A5A4   | A3A2A1A0   | A3A2A1A0   |
| 110       | A3A2A1A0   | A3A2A1A0   | A7A6A5A4   | A3A2A1A0   |
| 111       | A3A2A1A0   | A3A2A1A0   | A3A2A1A0   | A3A2A1A0   |

Note how when the entire partition set is open (1x 16-bit output)
that all of A is copied out, and either zero or sign extended
in the top half of the output.  At the other extreme is the
4x 4-bit output partitions, which have four copies of A, truncated
from the first 4 bits of A.

Unlike the parallel case, A is not itself partitioned, so is copied
over as much as is possible.  In some cases such as `1x 4-bit, 1x 12-bit`
(partition mask = `0b100`, above) when copying the 8-bit scalar source
into the highest part of B (o3) it is truncated to 4 bis (because
each partition of B is only 4 bits) but for copying to the 12-bit partition
(o2-o1-00) the 8-bit scalar source, A, will need sign or zero extending.