c9f94f9bd19f2d17b93c33a2deca774fc88f3f24
[libreriscv.git] / simple_v_extension / vblock_format / discussion.mdwn
1 # Alternative (SVPrefix) format
2
3 This VBLOCK mode effectively extends [[sv_prefix_proposal]] to cover multiple
4 registers. The basic principle: the "prefix" specifies which of source and
5 destination registers are to be considered "vectors" (or scalars), however
6 where in SVPrefix that applies to only one instruction, the "vector" tag
7 designations *continue to cascade* into subsequent instructions within the
8 VBLOCK.
9
10 Its advantage over the main format is that the main format requires
11 explicit naming of the registers to be tagged (taking up 5 bits each time).
12
13 | 15 | 14:12 | 11:10 | 9 | 8:7 | 6:0 |
14 | - | ----- | ----- | ----- | --------| ------- |
15 | rsvd | 16xil | rsvd | rsvd | SVPMode | 1111111 |
16
17 | SVPMode | 1:0 |
18 | ------- | --- |
19 | non-SVP | 0b00 |
20 | P48 Mode | 0b01 |
21 | P64 Mode | 0b10 |
22 | Twin-SVP | 0b11 |
23
24 non-SVP mode uses the extended format (see main VBLOCK spec [[vblock_format]])
25
26 When P48 Mode is enabled (0b01), the P48 prefix follows the VBLOCK header
27
28 | 15:11 | 10:0 |
29 | - | ---------- |
30 | rsvd | P48-Prefix |
31
32 When P64 Mode is enabled (0b10), the P64 prefix also follows:
33
34 | 31:16 | 15:11 | 10:0 |
35 | ---------- | - | ---------- |
36 | P64-prefix | rsvd | P48-Prefix |
37
38 When Twin-SVP Mode is enabled (0b11), a *second* P48-P64 prefix pair follows
39 in the VBLOCK, which applies vector-context from the *second* instruction's
40 registers.
41
42 # Rules
43
44 * SVP-VBLOCK is read (48/64), and indicates that certain registers are
45 to be "tagged". Element widths and predication are also specified
46 * The very first instruction (RVC, OP32) within the VBLOCk says **which**
47 registers those tags are to be associated with
48 * Those registers **remain** tagged with that context *for the entire duration
49 of the VBLOCK*.
50 * At the end of the VBLOCK the context terminates and the tags are discarded.
51 * There is rule in SVP about vs#/vd# fields, if they are not present in
52 a given P48/P64 prefix, an "implicit" field is created for that src or
53 dest register in the form of a bitwise "OR" of all present vs#/vd# fields.
54 *This rule continues to apply* to the instructions following the first
55 (and second, if applicable)
56 in the VBLOCK, however the ORing rule
57 *stops* i.e does not cascade via rd in the following instructions.
58 * If an instruction is used where registers are implicitly determined to be
59 scalars, they *remain* scalars when used in subsequent instructions.
60
61 Example (contrived):
62
63 * VBLOCK, P48 prefix only (SVPMode=0b01), vs1=1, vs2=0
64 * 1st instruction in VBLOCK: ADD x3, x5, x12
65 * 2nd instruction in VBLOCK: ADD x7, x5, x3
66 * 3rd instruction in VBLOCK: ADD x9, x4, x4
67 * 4th instruction in VBLOCK: ADD x7, x5, x4
68
69 * vs1=1 indicates that the source register rs1 is to be considered a vector,
70 whilst rs2 is to be a "scalar".
71 * The first instruction has "x5" as rs1. It is therefore "marked" as a vector
72 * However with there being no "specifier" for vd in the P48 prefix, vd is
73 calculated as "vd = vs1 | vs2" and is therefore set to "1".
74 * The "full" specification for the 1st add is therefore
75 "ADD vector-x3, vector-x5, scalar-x12".
76 * The second instruction also uses x5, however x3 is also now considered
77 a "vector", and, consequently, so is x7.
78 * The "full" specification for the 2nd add is therefore
79 "ADD vector-x7, vector-x5, vector-x3".
80 * The 3rd instruction has no context applied to any of its registers, therefore
81 x9 and x4 are determined to be "scalar"
82 * The specification for the 3rd add is therefore
83 "ADD scalar-x9, scalar-x4, scalar-x4"
84 * The 4th instruction. **despite** using x7 as vector in instruction 2, x7 is **not** listed in the 1st instruction's operands. Likewise for x4. Therefore the "OR" rule applies to them.
85 * x5 on the other hand *is* in the 1st instruction's operands, and, given that x4 abd x7 have the "OR" rule applied, are also marked as "vector" *despite x4 being fornerly scalar in the 3rd instruction*.
86 * Therefore, the "full" specification for the 4th add is:
87 "ADD vector-x7, vector-x5, vector-x4"
88
89 Writing those out separately, for clarity:
90
91 ADD vector-x3, vector-x5, scalar-x12 # from vs1=1, vs2=0, vd=vs1|vs2
92 ADD vector-x7, vector-x5, vector-x3 # x7: v-x5 | v-x3
93 ADD scalar-x9, scalar-x4, scalar-x4 # x9, x4 not prefixed, therefore scalar
94 ADD vector-x7, vector-x5, vector-x4 # x4, x7, x5 vector
95
96 Twin-SVP mode allows certain registers to be explicitly marked as "scalar",
97 where some of the rules might otherwise start to cascade through and cause
98 registers to be come undesirably marked as "vectors".
99
100 The reason why the OR rule cannot cascade onwards is because if a trap occurs and the context has to be reestablished, it may be reestablished purely with the VBLOCK header and by decoding the first (and second) instruction.
101
102 If the cascade of what was marked "vector" was allowed to continue, it would require re-reading of every opcode up to the point where execution of the VBLOCK left off, on order to reestablish the full cascade context.
103
104 # Discussion
105
106 * <https://groups.google.com/forum/#!topic/comp.arch/l2nzme2sCR0>
107 * <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-September/002622.html>