# :closed_lock_with_key: A Physical Unclonable Function for _any_ FPGA [![License](https://img.shields.io/github/license/stnolting/fpga_puf)](https://github.com/stnolting/fpga_puf/blob/main/LICENSE) [![DOI](https://zenodo.org/badge/428800193.svg)](https://zenodo.org/badge/latestdoi/428800193) * [Introduction](#Introduction) * [Theory of Operation](#Theory-of-Operation) * [Post-Processing](#Post-Processing) * [Top Entity](#Top-Entity) * [Evaluation](#Evaluation) * [Setup](#Setup) * [ID Results](#ID-Results) * [Reliability](#Reliability) * [Hardware Utilization](#Hardware-Utilization) * [Summary](#Summary) * [TODO](#TODO) ## Introduction A physical unclonable function (PUF) provides a digital _fingerprint_ that can be used as **unique identifier** for security-related applications like authentication. The `fpga_puf` hardware module provides an unclonable 96-bit unique identifier (ID) that is defined by the target chip's semiconductor characteristics. It is implemented in a technology-independent way that does not use any device-specific macros, primitives or attributes so it can be implemented on _any_ FPGA (verified using Intel Quartus Prime, Lattice Radiant and Xilinx Vivado). :key: The PUF ID is _unique_ for the combination of bitstream and _one_ specific FPGA. The same bitstream will lead to a different PUF ID on a different FPGA of same type. If the bitstream of a specific FPGA is changed (by changing design logic) the PUF ID of that specific FPGA will also change. **Key features** * [x] technology-, vendor- and platform-agnostic implementation * [x] easy to use * [x] tiny hardware footprint (less than 200 LUTs) * [ ] secure and reliable (:construction: work-in-progress) :loudspeaker: This is an ongoing research / proof-of-concept project. Feedback from the community is highly appreciated (see notes in section ["TODO"](#TODO))! ## Theory of Operation Each bit of the 96-bit PUF ID is generated by an individual **PUF cell**. Each cell consists of an asynchronous element that is based on a simple ring oscillator. The oscillator is constructed from a single inverter that negates it's own output signal. This feedback loop is interrupted by a latch. If the latch is open (=transparent) the oscillator starts oscillating. If the latch is closed, it stores the last state of the oscillator (high or low). The latch also provides a reset to bring each cell into a defined state. The frequency of each oscillator is defined by the mapping of the logic cell and the according routing. Both factors are constant for a specific bitstream. Furthermore, the frequency is also randomly (but constantly for a given setup) "tuned" by the chip's semiconductor characteristics. These are caused by tiny productions fluctuation (for example the capacitance / delay of a routing wire is affected by variation in oxide thickness). The frequency of each oscillator is considered to be fixed. Hence, sampling several periods within a fixed time window will always end in the same state (oscillator output is high or low). Temperature drift disturbs all oscillators in the (nearly) same way and has to be computationally eliminated/compensated by the [post-processing](#Post-Processing). Since `fpga_puf` does not use any device-specific attributes or primitives, there is a risk that the synthesis toolchain will remove the asynchronous PUF cells or collapse them into a single PUF cell ("optimize away"). Therefore, the design uses a shift register to control reset and the latch open/close phase of each cell individually and distributed over time. This concept is based on the [neoTRNG True Random Number Generator](https://github.com/stnolting/neoTRNG) and allows a **platform-independent implementation**. Whenever a new sampling of the PUF ID is started, a 96+1 bit wide shift register. A single '1' is applied to the least significant bit that travels throughout the whole shift register chain during operation. The PUF cell's control signals (reset and latch control) are connected to this shift register. Each shift register bit `i` controls the reset of PUF cell `i` and the latch control of PUF cell `i+1`. Thus, a cell's reset is active for one clock cycle. In the next clock cycle the cell's latch is opened for exactly one clock cycle allowing the oscillator to run. In the next clock cycle the latch is closed again and has captured the oscillator's last state. By this scheme only one oscillator is active at a time reducing potential distortion by oscillator cross-talk (i.e. the oscillation of one osc. affect the oscillations of another osc.). When the single one-bit reaches the most significant bit (bit 96) of the shift register the sampling process is completed and the latch states are sampled into a register that provides the obtained _raw_ PUF ID. The following figure is taken from the Intel Quartus Prime technology viewer showing a single PUF cell (cell #44). The oscillator as well as the latch are mapped to a single LUT3 element in the FPGA. The output of this asynchronous element is sampled by a simple register. Hence, a single PUF cell only requires one LUT and one FF. (`DATAA` is the latch reset (high-active), `DATAC` is the latch control (open when set), `DATAD` is the feedback of the inverter) ![puf_cell](https://raw.githubusercontent.com/stnolting/fpga_puf/main/img/intel_technology_cell.png) ### Post-Processing Some bits of the _raw_ PUF ID might be quite noisy, which complicates defining a _stable_ ID that does not change over time and over a broad range of operation conditions (e.g. temperature). A lot(!) of different approaches can be found in literatures. One promising approach is the use of error-correction codes to "stabilizes" an initially determined ID. However, these concepts are out of (my) scope so I used a simple "averaging" concept for this example. My approach samples the _raw_ PUF ID several times (for example 4096 times) and checks how often each bit of the ID is set across all sampled IDs. If an ID bit is set in more then half of the samples it is considered to be a static `1`, otherwise it is considered to be a static `0`. Since a few bits of the raw ID might be quite noisy, a hysteresis is used to eliminate those bits from the final PUF: if a bit is set or cleared more often than a certain _threshold_ it is considered "stable" in the final ID. These lower and upper hysteresis threshold define an _uncertain_ band between them. For the final ID all bits tha fall into this uncertainty band are masked to be always zero. My post-processing concept can be found in [`sw/main.c`](https://github.com/stnolting/fpga_puf/blob/main/sw/main.c). The source file provides more-detailed comments. ## Top Entity The top entity of the PUF IP module is `fpga_puf` ([`rtl/fpga_puf.vhd`](https://github.com/stnolting/fpga_puf/blob/main/rtl/fpga_puf.vhd)). It can be directly instantiated without the need for any special libraries. ```vhdl entity fpga_puf is port ( clk_i : in std_ulogic; -- global clock line rstn_i : in std_ulogic; -- SYNC reset, low-active trig_i : in std_ulogic; -- set high for one clock to trigger ID sampling busy_o : out std_ulogic; -- busy when set (sampling ID) id_o : out std_ulogic_vector(95 downto 0) -- PUF ID (valid after sampling is done) ); end fpga_puf; ``` :warning: Note that the PUF cannot be simulated due to it's combinatorial loops. ## Evaluation To evaluate this concept and the quality/reliability of the PUF IDs I am using the [NEORV32](https://github.com/stnolting/neorv32) as processor platform. The `fpga_puf` IP module is added to the processor's "Custom Functions Subsystem (CFS)", which is a _blank_ template for implementing custom application-specific co-processors. The setup is synthesized for different FPGAs using different toolchains (tested with Intel Quartus Prime, Lattice Radiant and Xilinx Vivado). A specific bitstream generated and programmed into _several_ FPGAs of the _same_ type to check for chip-specific ID variations. On one chip the ID is generated several times to check if it is "stable over time" and thus, reliable. ### Setup The application-specific PUF-wrapping CFS can be found in [`rtl/neorv32_cfs.vhd`](https://github.com/stnolting/fpga_puf/blob/main/rtl/neorv32_cfs.vhd). Make sure to use this CFS file instead of the default NEORV32 CFS source file when reproducing this setup. The CFS uses four memory-mapped registers to interface the PUF ID module: | CSF register address (_C_ access macro) | Access | Function | |:----------------------------------------|:------:|:------------------| | `NEORV32_CFS.REG[0]` | `r/w` | Control register | | `NEORV32_CFS.REG[1]` | `r/-` | PUF ID bits 31:0 | | `NEORV32_CFS.REG[2]` | `r/-` | PUF ID bits 63:32 | | `NEORV32_CFS.REG[3]` | `r/-` | PUF ID bits 95:64 | Bit _0_ of the control register (`PUF_CTRL_EN`) control the synchronous reset signal of the `fpga_puf` module (`rstn_i`). Setting this bit will activate the module, clearing it will put the module into reset state. Bit _1_ of the control register (`PUF_CTRL_SAMPLE`) is used to start the sampling of the ID. Writing one to it will set the trigger signal (`trig_i`) high for one cycle. Reading this control register bit will return the busy state of the `fpga_puf` module (`busy_o`). :floppy_disk: The low-level hardware-accessing functions are implemented in [`sw/fpga_puf_neorv32_cfs.c`](https://github.com/stnolting/fpga_puf/blob/main/sw/fpga_puf_neorv32_cfs.c). The header file [`sw/fpga_puf_neorv32_cfs.h`](https://github.com/stnolting/fpga_puf/blob/main/sw/fpga_puf_neorv32_cfs.h) provides the function prototypes, a PUF ID data type and also the NEORV32 CFS register mappings and bit definitions. The PUF test program ([`sw/main.c`](https://github.com/stnolting/fpga_puf/blob/main/sw/main.c)) is used to sample the chip's PUF ID and transmit it via UART to a terminal program: ``` Physical Unclonable Functions <fpga_puf> Test PUF implemented as NEORV32 Custom Functions Subsystem (CFS) Press any key to start PUF test (8 runs with 4096 samples each). Starting test... Run 0 ID: 0x37c0480063021011988c0095 Run 1 ID: 0x37c0480063021011988c0095 Run 2 ID: 0x37c0480063021011988c0095 Run 3 ID: 0x37c0480063021011988c0095 Run 4 ID: 0x37c0480063021011988c0095 Run 5 ID: 0x37c0480063021011988c0095 Run 6 ID: 0x37c0480063021011988c0095 Run 7 ID: 0x37c0480063021011988c0095 Test completed. ``` :floppy_disk: A pre-compiled NEORV32 executable of this test program is also available in this repository: [`sw/neorv32_exe.bin`](https://github.com/stnolting/fpga_puf/blob/main/sw/neorv32_exe.bin) (compiled for a minimal `rv32i` NEORV32 CPU) ### ID Results So far I have tested the PUF module on three Lattice iCE40 UltraPlus FPGAs and four Intel Cyclone IV FPGAs (shout-out to [@emb4fun](https://github.com/emb4fun) - thank you for your help!) and just one Xilinx Artix-7 FPGA. The ID was generated 8 times on each chip to "check" if it is reproducible (drawback: just in a very short time window...). The specific FPGA types are shown in section [Hardware Utilization](#Hardware-Utilization). | FPGA | PUF ID ("Fingerprint Key") | |:--------------------------------|:-----------------------------| | Lattice iCE40 UltraPlus - **1** | `0x37c0480063021011988c0095` | | Lattice iCE40 UltraPlus - **2** | `0x592063e50118040c2000112b` | | Lattice iCE40 UltraPlus - **3** | `0x941c82505112323600b0c221` | | Intel Cyclone IV - **1** | `0x9fbe33dc9021be3156cae31b` | | Intel Cyclone IV - **2** | `0xa726db495a2a8346b30f2000` | | Intel Cyclone IV - **3** | `0x413fe25557311be7f9f5edfb` | | Intel Cyclone IV - **4** | `0xca8ee8dc7945fa60f770e1fa` | | Xilinx Artix-7 - **1** | `0x6faaf93af77cf77fef91fe79` | So far, the IDs are _unique_ for each tested FPGA! ### Reliability **:construction: work in progress :construction:** A long-time test is used to sample and check _raw_ IDs (not the pre-processed ones) for _stability over time_. Note that my setup is placed in pretty stable environment conditions. The test generates an _initital_ ID `I` right at the beginning of execution. In an endless loop two consecutive IDs `A` and `B` are sampled (a single "Run"). To evaluate the the "instability" the Hamming distance (number of bits that are not identical across two samples) is computed. This test computes the _relative_ Hamming distance of the two consecutive samples `A` and `B` (= `H(A,B)`) and also the relative Hamming distance between the initial sample and sample `A` (= `H(I,A)`) from the current run. Furthermore, the maximum ob both distances is computed over all runs (`H_max(A,B)` and `H_max(I,A)`). To identify _noisy bits_ an "accumulated bit-change-mask" `F` is computed. This mask is computed for every run by XOR-ing the obtained samples `A` and `B` with the initial ID `I`. The mask from a run is OR-ed with the mask from the previous run to accumulate the noisy bits over time. Cut-out of the test log: ``` ... Run 43373: I=0x9fae9dd83029bc7156cbe37b, A=0xbfbe3bfc9423beb156cae31b, B=0xbfbe3bfcb021beb156cae31b - F=0x6038be24ac1e83c080214860 (33) - H(A,B)=3, H_max(A,B)=9 - H(I,A)=19, H_max(I,A)=23 Run 43374: I=0x9fae9dd83029bc7156cbe37b, A=0xbfbe3bfc9435beb156eae31b, B=0xbfbe3bfcb431beb156eae31b - F=0x6038be24ac1e83c080214860 (33) - H(A,B)=2, H_max(A,B)=9 - H(I,A)=21, H_max(I,A)=23 Run 43375: I=0x9fae9dd83029bc7156cbe37b, A=0xbfbe3bfc9425beb156caeb1b, B=0xbfbe3bdcb421beb156cae31b - F=0x6038be24ac1e83c080214860 (33) - H(A,B)=4, H_max(A,B)=9 - H(I,A)=20, H_max(I,A)=23 Run 43376: I=0x9fae9dd83029bc7156cbe37b, A=0xbfbe3bfc9425beb156eaeb1b, B=0xbfbe3bfcb021beb156eae31b - F=0x6038be24ac1e83c080214860 (33) - H(A,B)=4, H_max(A,B)=9 - H(I,A)=21, H_max(I,A)=23 Run 43377: I=0x9fae9dd83029bc7156cbe37b, A=0xbfbe3bfc9021beb156caeb1b, B=0xbfbe3bdc9425beb156cae31b - F=0x6038be24ac1e83c080214860 (33) - H(A,B)=4, H_max(A,B)=9 - H(I,A)=18, H_max(I,A)=23 ... ``` This very first test was run for approx. half an hours making ~20 runs per second. It shows a maximal Hamming distance of 23 bits while 33 bits (`F`) tend to be noisy (compared to the initial ID `I` sampled once right at the beginning of the test). The number of noisy bits increases slowly over time, probably because of increasing chip temperature. It tends to increase slower over time, so there might be a saturation at some point. The temperature of the PUF cells is most important because that impacts the oscillator frequencies. The PUF cells heat up due to the _continuous_ PUF operation. The PUF from this test setup provides 96-33=**63** bits that seem to be stable over the observed time. This also means that the usable key space is reduced (63-bit instead of 96-bit). ### Hardware Utilization Mapping results for the custom function subsystem (CFS), the `fpga_puf` module and a single PUF cell. | Lattice ice40 UltraPlus `iCE40UP5K-SG48I` @24MHz | Logic Cells | Logic Registers | |:-----------------------------------------------------------|------------:|----------------:| | neorv32_cfs_inst_true.neorv32_cfs_inst | 171 (71) | 232 (35) | | -fpga_puf_inst | 100 (4) | 197 (101) | | --fpga_puf_cell_inst[0].fpga_puf_cell_inst_i | 1 (1) | 1 (1) | | Intel Cyclone IV `EP4CE22F17C6N` @100MHz | Logic Cells | Logic Registers | |:-----------------------------------------------------------|------------:|----------------:| | neorv32_cfs:\neorv32_cfs_inst_true:neorv32_cfs_inst | 241 (43) | 232 (35) | | -fpga_puf:fpga_puf_inst | 198 (102) | 197 (101) | | --fpga_puf_cell:\fpga_puf_cell_inst:0:fpga_puf_cell_inst_i | 1 (1) | 1 (1) | | Xilinx Artix-7 `XC7A35TICSG324-1L` @100MHz | Logic Cells | Logic Registers | |:-------------------------------------------------------------|------------:|----------------:| | neorv32_cfs_inst_true.neorv32_cfs_inst (neorv32_cfs) | 133 | 328 | | -fpga_puf_inst (fpga_puf) | 133 | 293 | | --fpga_puf_cell_inst[0].fpga_puf_cell_inst_i (fpga_puf_cell) | 1 | 2 (Latch + FF) | :information_source: Xilinx Vivado can detect the PUF cell's latch and infers a FF primitive, which also provides a latch mode. This does not compromise the functionality of the PUF. ## Summary The PUF IDs are unique for each FPGA and can be successfully implemented on different FPGAs (Xilinx, Intel, Lattice). The **raw PUF ID** is only partly reproducible, because some bits tend to be quite noisy (see results above). A better post-processing algorithm using error-correction codes should be able to compensate for that. This is **work in progress**. The latest test showed that there is a certain number of bits that are quite noisy so they cannot be used to determine the PUF IF. Obviously, this reduces the maximum key space (for example only 63-bit of the 96-bit ID are usable). To circumvent this, the PUF can be made arbitrarily wide (for example providing a 256-bit raw ID). Of course this will also introduce additional unusable noisy bits, but we expect that the percentage of noisy bits within the PUF ID is a device-specific constant. Each additional PUF ID bit adds 2 FFs (one for the SREG and one for the PUFF cell) and 1 LUT. For each additional bit the ID sampling time is increased by one clock cycle. :loudspeaker: If you have any kind of ideas or feedback (for example how to improve reliability) feel free to open a new [issue](https://github.com/stnolting/fpga_puf/issues). I am also happy to get more data, so if you have ported the design on another FPGA you can open a [pull request](https://github.com/stnolting/fpga_puf/pulls) and add your results. ## TODO * test more FPGAs * check stability in a controlled environment (e.g. temperature chamber) * evaluate more sophisticated post-processing algorithms * sample more data from more long-time test * faulty bits over time (and temperature) * hamming distance over time (and temperature) * ... * to be continued...