# Crypto-router ASIC                     **This project has received funding from the European Union’s Horizon 2020 research and innovation programme within the framework of the NGI-POINTER Project funded under grant agreement No 871528** **This project has received funding from the European Union’s Horizon 2020 research and innovation programme within the framework of the NGI-ASSURE Project funded under grant agreement No 957073.** * NLnet page: [[nlnet_2021_crypto_router]] * Top-level bugreport: * ASIC/IO Pin specification page: [[crypto_router_asic/crypto_router_pinspec]] # Goal To build the foundations of a cryptographic extension of the POWER ISA, allowing anyone interested to build upon this effort and make an Cryptorouter FPGA or ASIC for oneself. # Deliverables See top-level bugreport [#589](https://bugs.libre-soc.org/show_bug.cgi?id=589#c0) - all Milestones were achieved 100% successfully as defined, including one additional Milestone added after the initial approval in 2021, for [power-modulo](https://bugs.libre-soc.org/show_bug.cgi?id=1044) arithmetic (the basis of RSA, DH etc). **1) A set of general-purpose scalar instructions suitable for cryptographic applications as well as many other purposes** See [Big integer arithmetic (bigint)](/openpower/sv/biginteger) and [Bit manipulation (bitmanip)](/openpower/sv/bitmanip) for rationale, instruction list and definition in pseudo-code. Relevant milestones: * [Bug 770](https://bugs.libre-soc.org/show_bug.cgi?id=771): 1. Discussion and Finalisation of Which Cryptographic Primitives to Implement * [Bug 776](https://bugs.libre-soc.org/show_bug.cgi?id=776): 7. Documentation of designs, code, processes, and other relevant things as needed **2) Implementation and validation of the above instructions on the ISA simulator** As with all large software projects the implementation is scattered within the simulator code, which is available at: Unit tests are available at: * [Test cases for bitmanip instructions](https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/test/bitmanip/bitmanip_cases.py;h=93476025fc31dc5d42d4a86a27d4b826810436e2;hb=HEAD) * [Test cases for bigint instructions](https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/test/bigint/bigint_cases.py;h=2944ad431e586dca1b572f4be4c1a9c7a3e82e89;hb=HEAD) The above uses the ISA Simulator (see [Simulator Test API](/docs/testapi)). To run the above tests cases, [install the developer environment](/HDL_workflow/devscripts), go to the `~/src/openpower-isa/src/openpower/decoder/isa` directory, and run `python3 test_caller_bigint.py` and `python3 test_caller_bitmanip.py`. Relevant Milestone: * [Bug 771](https://bugs.libre-soc.org/show_bug.cgi?id=771): 2. Creation of Cryptographic-Primitive OpenPower ISA Pseudo-code **3) Reference HDL implementation of some instructions** (full implemention was not possible within limited 2021-02-051 budget [[nlnet_2021_crypto_router]]) Code and tests are available: * [HDL implementation of Ternlogi bitmanip instruction](https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/lut.py;h=755747ab2073dbf1a7620f9ac31e592b2bf63a44;hb=HEAD) * [HDL implementation of Grev bitmanip instruction](https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/grev.py;h=2b22fe1bf35ba3e5f2787b62bbf36c329a444787;hb=HEAD) * [HDL Implementation of Galois Field instructions](https://git.libre-soc.org/?p=nmigen-gf.git;a=tree;f=src/nmigen_gf/hdl;hb=bc0c03b3df2fa19189aaa2b61a101cdc8ebf1beb) * [Unit test for the HDL implementation of Ternlogi](https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/test/test_lut.py;h=e0a98099460ded8912299b05c513dc0f924005d7;hb=HEAD) * [Unit test for the HDL implementation of Grev](https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/test/test_grev.py;h=780239d8a13b2954a7953d5d2e312dd517a80347;hb=HEAD) * [Formal verification for the HDL implementation of Ternlogi](https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/shift_rot/formal/proof_main_stage.py;h=379211d623a01259f77c90229cae0d57f40228a7;hb=HEAD#l311) * [Formal verification for the HDL implementation of Grev](https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/shift_rot/formal/proof_main_stage.py;h=379211d623a01259f77c90229cae0d57f40228a7;hb=HEAD#l321) * [Unit test and formal verification for the HDL implementation of Galois Field instructions](https://git.libre-soc.org/?p=nmigen-gf.git;a=tree;f=src/nmigen_gf/hdl/test;hb=bc0c03b3df2fa19189aaa2b61a101cdc8ebf1beb) To run the HDL tests, just [install the developer environment](/HDL_workflow/devscripts) and directly run the test scripts referenced above. Relevant Milestones: * [Bug 772](https://bugs.libre-soc.org/show_bug.cgi?id=772): 3. Creation of the HDL Code for the Instructions and Associated Unit-Tests * [Bug 840](https://bugs.libre-soc.org/show_bug.cgi?id=840): 8. Formal proofs and unit tests for cryptoprimitives **4) Additional specification of and simulation for concepts like a REMAP engine and element width overrides** These, when implemented also in HDL, will allow hyper-efficient acceleration of many fundamental crypto algorithms in hardware. * [REMAP documentation](https://libre-soc.org/openpower/sv/remap/) * [Element width overrides documentation](https://libre-soc.org/openpower/sv/overview/#elwidths) These are implemented 100% in the ISA simulator, allowing 100% successful implementation and simulation of Simple-V-PowerISA assembler to be made. Once the HDL for these key criticl parts of SV are available (when funded) then as usual the exact same assembler run under the simulator may be run on FPGA or ASIC. (But limited budget of 2021-02-051 was insufficient to complete HDL implementation) **5) Implementation of a few cryptographic primitives that happen to also help accelerate cryptographic algorithms** Cryptographic algorithms routinely use multi-byte quantities. Some big-integer cryptographic primitives were implemented on top of the SVP64 vectorisation of the above scalar instructions: * [Big integer multiplication primitive](https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/test/bigint/powmod.py;h=7fc794685bebb1f3c2451c64da041a0e81143e29;hb=HEAD#l29) * [Big integer division/modulus primitive](https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/test/bigint/powmod.py;h=7fc794685bebb1f3c2451c64da041a0e81143e29;hb=HEAD#l131) * [Big integer modular exponentiation primitive](https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/test/bigint/powmod.py;h=7fc794685bebb1f3c2451c64da041a0e81143e29;hb=HEAD#l991) * A [presentation](https://ftp.libre-soc.org/fosdem_2024/fosdem2024_bigint.pdf) on big integer arithmetic primitives on top of SVP64 vectorization. To test the above primitives in the ISA simulator, [install the developer environment](/HDL_workflow/devscripts), go to the `~/src/openpower-isa/src/openpower/decoder/isa` directory, and run `SILENCELOG=1 python3 test_aaa_caller_svp64_powmod.py` (warning: long running). Relevant Milestone: * [Bug #1044](https://bugs.libre-soc.org/show_bug.cgi?id=1044): 9. Demo of modulo exponent biginteger **6) Implementation of a cryptographic algorithm (chacha20) using the new instructions and primitives** One catastrophic mistake made by many cryptographic instruction implementations is to create over-specific instructions. "multiply by 2 then subtract 5" for example (the basis of a RISC-V chacha20 "accelerator"!) Using our instructions, [our implementation of chacha20](https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=crypto/chacha20/src/xchacha_encrypt_bytes_svp64.s;h=c1e0a8675cf679036b27de0bf83f8320ee36339a;hb=HEAD) only has TEN INSTRUCTIONS in the inner loop entire algorithm - a 50 to 100-fold reduction in code density. See: [chacha20 design document](/openpower/sv/cookbook/chacha20). To run the chacha20 test in the ISA simulator, go to the `~/src/openpower-isa/crypto/chacha20` directory, run `make` and `SILENCELOG=1 ./test-chacha20` (warning: long running). This unit test may also be run directly Relevant Milestone: * [Bug 773](https://bugs.libre-soc.org/show_bug.cgi?id=773): 4. High-Level Demos of Cryptographic and Other Relevant Algorithms **7) Binutils support for assembling the above instructions** Currently, our reference Python assembler needs to be used to translate assembly files containing the new instructions. However, many (not all) instructions were added to the Binutils assembler (gas) as well. See: [code](https://git.libre-soc.org/?p=binutils-gdb.git;a=shortlog;h=refs/heads/svp64). To install, run the `./binutils-gdb-install` script from the [developer scripts](/HDL_workflow/devscripts). Further reading: [Bug 964 - binutils: support maddedu, divmod2du instructions](https://bugs.libre-soc.org/show_bug.cgi?id=964) **8) A flexible self-contained HDL platform (ls2) for implementing a System-on-Chip on an FPGA or ASIC** The ls2 platform can compile a Microwatt compatible core (the reference libre-soc one, or Microwatt itself), together with select peripherals (internal RAM, SPI, Ethernet, HyperRAM, etc), for your target FPGA board (Arty A7-100t, VERSA_ECP5, other). * [Documentation](/HDL_workflow/ls2/) (installation, running and uploading to an FPGA) * [Code](https://git.libre-soc.org/?p=ls2.git;a=blob;f=src/ls2.py;h=48f6cca7e06ac16ec42e76c361945e3943dca4b2;hb=HEAD) Relevant Milestone: * [Bug 774](https://bugs.libre-soc.org/show_bug.cgi?id=774): 5. Equipment needed, such as FPGA boards and Ethernet PMODs # Helpful information for Cryptorouter implementations: Given the work above, the information below is useful for allowing anyone interested to work towards building a Cryptorouter FPGA or ASIC for oneself: ## Specifications, 2020 All of these are entirely Libre-Licensed or are to be written as Libre-Licensed: * 300 mhz single-core, [Libre-SOC](https://git.libre-soc.org/?p=soc.git;a=blob;f=README.md;hb=HEAD) OpenPOWER CPU with [[openpower/sv/bitmanip]] extensions * 180/130 nm (TBD) * 5x [[shakti/m_class/RGMII]] Gigabit Ethernet PHYs with [SRAM](https://github.com/adamgreig/daqnet/blob/master/gateware/daqnet/ethernet/rmii.py) on-chip, built-in. * 2x USB [[shakti/m_class/ULPI]] PHYs * Direct DMA interface (independent bulk transfer) * [JTAG](https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/debug/jtag.py;hb=HEAD), GPIO, I2C, PWM, UART, SPI, QSPI, SD/MMC * On-board Dual-ported SRAM (for Packet Buffers) * Opencores [[shakti/m_class/sdram]] * Wishbone interfaces to all peripherals * [XICS ICP / ICS](https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/interrupts/xics.py;hb=HEAD) Interrupt Controller ## Example packet transfer * Packet comes in on RGMII port 1. Each PHY has its own dual-ported SRAM * Packet is **directly** stored in internal (dual-ported SRAM) by the RGMII PHY itself * Interrupt notification is sent to the processor (XICS) * Processor inspects packet over Wishbone interface directly connected to 2nd SRAM port. * Processor computes, based on decoding the ETH Frame, where the packet must be sent to (which other RGM-II port: e.g. Port 2) * Processor initiates Memory-to-Memory DMA transfer * DMA Memory-to-Memory transfer, using Wishbone Bus, copies the ETH Frame from one on-board SRAM to the target on-board SRAM associated with Port 2. * DMA Engine generates interrupt (XICS) to the CPU to say it is completed * Processor notifies target RGM-II PHY to activate "send" of frame out through target RGM-II port 2. ## Testing and Verification We will need full HDL simulations as well as post P&R simulations. These may be achieved as follows: * ISA-level unit tests as well as Formal Correctness Proofs. Example [bpermd proof](https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/logical/formal/proof_bpermd.py;hb=HEAD) and individual unit tests for the [Logical pipeline](https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/logical/test/test_pipe_caller.py;hb=HEAD) * simulation with some peripherals developed in c++ as verilator modules * nmigen-based OpenPOWER Libre-SOC core co-simulation such as this unit test, [test_issuer.py](https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/test/test_issuer.py;hb=HEAD) * [cocotb pre/post PnR](https://git.libre-soc.org/?p=soc-cocotb-sim.git;a=tree;f=ls180;hb=HEAD) including GHDL, Icarus and Verilator (where best suited) Actual instructions being developed (bitmanip) may therefore be unit tested prior to deployment. Following that, rapid simulations may be achieved by running ls2 (the same HDL may also easily be uploaded to an FPGA). When it comes to Place-and-Route of the ASIC, the cocotb simulations may be used to verify that the GDS-II layout has not been "damaged" by the PnR tools. Peripherals functionality tests must also be part of the simulations, particularly using cocotb, to ensure that they remain functional after PnR. Supercomputer access for compilation of verilator and/or cxxrtl is available through [[fed4fire]]