This is the artifact of the paper Optimized Software Implementation of Keccak, Kyber, and Dilithium on RV{32,64}IM{B}{V}. You can cite this work like this:
@article{ZhangYHK24,
author = {Jipeng Zhang and
Yuxing Yan and
Junhao Huang and
{\c{C}}etin Kaya Ko{\c{c}}},
title = {{Optimized Software Implementation of Keccak, Kyber, and Dilithium
on RV\{32,64\}IM\{B\}\{V\}}},
journal = {{IACR} Trans. Cryptogr. Hardw. Embed. Syst.},
volume = {2025},
number = {1},
pages = {632-655},
year = {2024},
month = {Dec.}
url = {https://tches.iacr.org/index.php/TCHES/article/view/11941}
}
If your goal is to reproduce the experimental results in our paper, please refer to the ches2025
branch. This branch has several updates compared to the ches2025
branch:
- The code for the NTT RVV implementation has been refactored.
- Support for the SpacemiT X60 core with a VLEN of 256 bits has been added.
- Updates have been made in accordance with the latest FIPS 203 standard, primarily drawing references from the
pq-crystals/kyber
repository.
This project reused public-domain code from the following repositories: https://github.com/pq-crystals/kyber and https://github.com/pq-crystals/dilithium.
This project is compatible with two development boards: the CanMV-K230 Development Board equipped with the XuanTie C908 core (VLEN=128 bits) and the SpacemiT Key Stone Development Board featuring the SpacemiT X60 core (VLEN=256 bits). You can select either one to run this project.
Please note that the CanMV-K230 development board supports 32-bit RISC-V, while the SpacemiT Key Stone does not.
- Development Board: CanMV-K230 development board, which includes a C908 core.
- Cross Compiler: Xuantie-900 linux-5.10.4 glibc gcc Toolchain V2.8.0 B-20231018.
- Host Machine: A Linux-capable machine used for cross-compiling. Note that the Linux running on the C908 is a minimal version and is not suitable for direct development; therefore, cross-compilation is necessary.
- Router: A router is used to set up a local area network (LAN), allowing the host machine to transfer files to the development board via
scp
and connect to the development board viassh
. - Network Topology: The host machine and the K230 development board are on the same local network. The host machine cross-compiles executable files for the C908 core and transfers them to the development board using the
scp
tool. Then, the host machine connects to the development board viassh
to run the executable files and obtain the experimental results. - Necessary Modifications: All Makefiles in this project have hard-coded information such as the IP address of the development board within the local network (
192.168.123.99
), the username on the Linux running on the C908 (root
), and the directory on the C908 where the executable files are stored (/sharefs
). These details need to be modified according to your specific setup, especially the IP address of the development board within the local network.
We use the CanMV-K230 development board based on the Kendryte K230 SoC. This SoC adopts a big.LITTLE heterogeneous design, and we only use the big core in this work. The big core is based on the XuanTie C908 processor from T-Head Semiconductor, with the following information:
- Instruction set: RV{32,64}GCBV, vector extension version is 1.0
- Frequency: 1.6GHz
- Cache: 32KB L1 instruction/data cache, 256KB L2 cache
- Microarchitecture: Dual-core, 9-stage in-order pipeline
- Memory: 512MB
- Operating system: Linux version 5.10.4
- Compiler: riscv64-unknown-linux-gnu-gcc (Xuantie-900 linux-5.10.4 glibc gcc Toolchain V2.8.0 B-20231018) 10.4.0, whose download link is here. You can find more related resources at XuanTie community.
For the tutorial on how to use this development board, we refer readers to K230 doc and K230 sdk. We only emphasize some key configurations here. The command we use to make the system image for the development board is make CONF=k230_evb_only_linux_defconfig
, which generates Linux OS by BuildRoot for the big core, so that we can run RVV instructions on the big core.
The current SDK provided by this development board runs Linux (generated by BuildRoot) on the big core, which is a simplified Linux system suitable for embedded scenarios. It does not support a complete set of dynamic libraries. Therefore, under the current configuration, only executable files obtained using the -static
option can run on the development board. For the sake of fairness in comparison, all experiments involved in this work use the -static
option.
For the configuration of this development board, please refer to the website. This development board supports a full-fledged Linux operating system, so there's no need for additional preliminaries.
Within a specific directory, such as Kyber/RV64
, you'll find the Makefile_x60
file. This file offers support for the SpacemiT Key Stone development board. To utilize this file, you can use the following example command: make all -f Makefile_x60
.
The Makefile
provides support for the CanMV-K230 development board. In the following, we'll take it as an example for explanation.
We will use cpi/Makefile
as an example to explain our basic project structure. In the dependency list of the all
target, besides the executable files to be compiled, there is also the target out/scp_speed
, which is implemented as follows:
out/scp_speed: \
out/cpi_rv32imb out/cpi_rv64imb out/cpi_rvv \
out/cpi_rv_vgroup \
out/cpi_ntt_rv32imv
$(SCP) $^ $(TARGET_USER)@$(TARGET_IP):$(TARGET_DIR)/
@echo 1 > $@
This means it will transfer all the generated executable files to the development board via scp
. In summary, our make all
command not only compiles the executable files but also uses scp
to transfer all executables to the development board.
Additionally, our cpi/Makefile
defines the target run_all
, as shown below:
run_all: run_rv32imb run_rv64imb run_rvv run_rv_vgroup run_ntt_rv32imv
Taking the run_rv32imb
target as an example:
run_rv32imb: out/scp_speed
ssh $(TARGET_USER)@$(TARGET_IP) "cd $(TARGET_DIR) && ./cpi_rv32imb" > cpi_rv32imb.txt
This means that the host computer connects to the development board via ssh
, runs the corresponding executable, and redirects the output to a txt
file on the host computer.
Therefore, the basic steps to reproduce our experimental data are to navigate to the appropriate directory and then run the make run_all
command. You can also use make all -j; make run_all
, where make all -j
leverages multithreading to speed up the compilation.
Table 1 in our paper presents the latency and CPI (Cycles Per Instruction) for various instructions of the C908 core. Some of these results are directly obtained from the C908 user manual. Additionally, we performed a series of microbenchmarks, which can be found in the cpi/
directory. The source code files are identified by the .S
and .c
extensions, and the test results are in files with the .txt
extension.
The commands you need to run are as follows:
cd cpi
make all -j
make run_all
The experimental results are output to the txt
files in the cpi/
directory.
For a detailed explanation of the principles and results of the microbenchmarks, please refer to cpi/README.md
.
Table 2 in our paper presents the experimental results for various Keccak-f1600 implementations. The data for the ARM Cortex-A55 and ARM Cortex-M4 platforms are directly taken from the corresponding papers, while the remaining data were obtained by running the following commands:
cd sha3
make all -j
make run_all
The experimental results are output to the txt
files in the sha3/
directory.
The data in results_ko_rv32im.txt
are from testing Ko Stoffelen's implementation on our platform with the RV32IM ISA. The data in results_riscvcrypto_rv64imb.txt
are from testing the RISCV-Crypto implementation on our platform with the RV64IMB ISA. Files with the _ref.txt
suffix contain test results of reference implementations on the corresponding ISAs. The remaining .txt
files contain the test results of the optimized implementations from our work.
Table 3 in our paper reports the experimental results of various NTT implementations. The data for the ARM Cortex-A72 and ARM Cortex-M4 platforms are directly taken from the corresponding papers, while the remaining data were obtained by running the following commands:
cd ntt
make all -j
make run_all
The experimental results are output to the ntt/speed_ntt.txt
file, which contains more comprehensive data than Table 3 in the paper.
The relationship between the entries in Table 3 and the experimental results in the ntt/speed_ntt.txt
file is as follows:
Table 3 | speed_ntt.txt |
---|---|
Kyber [HZZ+24] on RV32IM | speed_singleissue_kyber_plantard_ntt_rv32 |
Kyber Our on RV32IM | speed_dualissue_kyber_plantard_ntt_rv32 |
Kyber Our on RV64IM | speed_dualissue_kyber_plantard_ntt_rv64 |
Kyber Our on RVV | speed_dualissue_kyber_mont_ntt_rvv |
Dilithium Our on RV32IM | speed_dualissue_dilithium_mont_ntt_rv32 |
Dilithium Our on RV64IM | speed_dualissue_dilithium_plant_ntt_rv64 |
Dilithium Our on RVV | speed_dualissue_dilithium_mont_ntt_rvv |
Table 4 in our paper presents the performance comparison of various Kyber and Dilithium implementations. The data for the ARM Cortex-A72 and ARM Cortex-M4 platforms are directly taken from the corresponding papers.
To reproduce the experimental data for Kyber and Dilithium, navigate to the corresponding directory and run the following commands:
make all -j; make speed -j
make run_diff_vectors
make run_speed
The executable files generated by the all
target are primarily used to verify the correctness of the Kyber or Dilithium implementations and to generate test vectors. The executable files generated by the speed
target are used for performance testing.
The make run_diff_vectors
command generates test vectors and compares them with those generated by the reference implementation to verify the correctness of our implementation. The make run_speed
command performs performance testing on Kyber or Dilithium, with the results output to the corresponding txt
files.
If you only want to perform performance testing, simply run the following commands:
make speed -j
make run_speed
To reproduce the [HZZ+24] on RV32IM results, please refer to ches2025
branch.
For the performance of the Kyber reference implementation on various ISAs, run the following commands:
cd Kyber/ref
make speed -j
make run_speed
The experimental results will be output to the txt
files in the Kyber/ref
directory.
For the performance of our optimized implementation on RV32, run the following commands:
cd Kyber/RV32
make speed -j
make run_speed
The experimental results will be output to the txt
files in the Kyber/RV32
directory.
For the performance of our optimized implementation on RV64, run the following commands:
cd Kyber/RV64
make speed -j
make run_speed
The experimental results will be output to the txt
files in the Kyber/RV64
directory.
For the performance of the Dilithium reference implementation on various ISAs, run the following commands:
cd Dilithium/ref
make speed -j
make run_speed
The experimental results will be output to the txt
files in the Dilithium/ref
directory.
For the performance of our optimized implementation on RV32, run the following commands:
cd Dilithium/RV32
make speed -j
make run_speed
The experimental results will be output to the txt
files in the Dilithium/RV32
directory.
For the performance of our optimized implementation on RV64, run the following commands:
cd Dilithium/RV64
make speed -j
make run_speed
The experimental results will be output to the txt
files in the Dilithium/RV64
directory.