--- /dev/null
+# Elliptic Curve ed25519
+
+Links:
+
+* [ed25519_mul.py](https://git.libre-soc.org/?p=openpower-isa.git;a=blob_plain;f=src/openpower/decoder/isa/ed25519/curve25519_mul.py;hb=de10b86f7b3)
+* [donna-ed25519](https://bugs.libre-soc.org/show_bug.cgi?id=773#c1)
+* [Triangular REMAP discussion](https://lists.libre-soc.org/pipermail/libre-soc-dev/2022-May/004881.html)
+
+ed25519 is strategically important as its implementation was highly optimised
+during its design, for high security.
+[Edwards-curve Digital Signature Algorithm (EdDSA)](https://en.wikipedia.org/wiki/EdDSA#Ed25519)
+was also designed to be fast.
+
+In the donna-ed25519 implementation, key functions such as ed25519_mul
+are laid out explicitly by loop-unrolling:
+
+```
+t[0] = r0 * s0
+t[1] = r0 * s1 + r1 * s0;
+t[2] = r0 * s2 + r1 * s1 + r2 * s0;
+t[3] = r0 * s3 + r1 * s2 + r2 * s1 + r3 * s0;
+t[4] = r0 * s4 + r1 * s3 + r2 * s2 + r3 * s1 + r4 * s0;
+```
+
+Note the very obvious patterns here which are triangular in nature.
+With the very existence of Simple-V's REMAP subsystem it is quite
+natural to see if triangular remapping can be added and used.
+It turns out to be quite easy, and there are two possible techniques:
+Vertical-First and Horizontal-First.
+
+With Vertical-First, the multiply is done first as a scalar item,
+into a temporary register, followed by an addition of the scalar
+into the actual target (t0 thru t4)
+
+```
+sv.mul temp, *r, *s # temporary target scalar register
+sv.add *t,*t,temp # add temporary scalar onto target vector
+```
+
+With Horizontal-First it is extremely simple: use `madd` -
+integer multiply-and-accumulate:
+
+```
+sv.madd *t, *r, *s
+```
+
+In both cases, all three target registers are set up with the
+same REMAP Schedules. Additionally in both cases, t0-t4 must
+be pre-initialised to zeros.
+
+As always with Simple-V, the power of simplicity comes primarily
+from the REMAP subsystem. However in a secure environment,
+reduced instruction count is also critical not just for power
+consumption but to get the size of the binary down small enough
+that it could fit easily into a few lines of L1 Cache.
+If a huge number of loop-unrolled instructions (the normal
+way of handing these algorithms) are reduced down to a
+bare handful, with the looping covered in hardware, then it
+is easy to understand how valuable Simple-V and REMAP is.