Protecting secret keys in networked devices with table encoding against power analysis attacks

Abstract

Nowadays, secret keys of networked devices are profoundly attacked by power analysis attacks, caused by the dramatic evolution of statistical analysis with a simple experimental setup. Recently, OpenSSL and CoreBitcoin running on Android and iOS have been broken by power analysis. Moreover, sensors and actuators can also be attacked thereby threatening user’s privacy and security. To resolve these challenges, power-analysis-resistant implementations of cryptographic algorithms in networked devices have received a lot of attentions. Masking schemes have been developed to implement secure cryptographic algorithms against side-channel analysis (SCA) attacks. Technically, the first-order masking method is vulnerable to the second order differential power analysis (2ODPA) attacks, but the current solutions against 2ODPA are expensive to be implemented. Moreover, worse performance will be shown if the cryptographic algorithms include boolean and arithmetic operations. In this paper, we propose a new countermeasure scheme to resist SCA attacks. Our scheme randomizes all the intermediate values of block cipher by encoding functions in the algorithm to lookup table and makes it resistant to power analysis attack. We apply our scheme to the block cipher algorithm, HIGHT. Our protected implementation of HIGHT takes only 1.79 times compared to the straightforward algorithm, and it needs 25 kbytes to store lookup tables in memory.

Keywords

Side channel attack power analysis countermeasure block cipher

1. Introduction

1.1. Background

In the black box model which is a class attack type in the field of symmetric-key cryptography, an attacker is assumed to be able to access only inputs and outputs of the cipher with known- or chosen-plaintexts or ciphertexts. After Kocher et al. [9] introduced simple power analysis (SPA) and differential power analysis (DPA), a lot of researches have been published about various methods to protect against these power analysis attacks. SPA, as one of the side-channel analysis (SCA) attacks, involves a visual examination of graphs of the current used by a device over time. Variations in power consumption occur that the device performs different operations. Just one power trace is enough to launch an attack, provided that an attacker knows about the wave patterns of target operations. A DPA attacker computes the intermediate values of the target operation and statistically analyzes power consumption measurements from an attacked device thereby does not require detailed information about the device. Sometimes, SCA is classified as a grey box model because an attacker has more information compared to the black box model. Generally speaking, SCA is a physical attack to find out secret data by using leaked information, such as power consumption, electromagnetic wave, or timing, which are obtained through the physical implementation of a cryptosystem. In this sense, SCA includes not only power analysis but also fault attacks or electromagnetic analysis. Here we focus on power analysis, where an attack uses the statistical dependency between the intermediate values and the leaked information to determine the secret key related to the attacked intermediate values.1

¹
A preliminary version of this paper appeared in World Conference on Information Security Applications 2014, August 25–27, Jeju, Korea. This version includes a concrete analysis and supporting experimental results.

Previously, SCA has been considered too costly due to its expensive experimental setup. In particular, a specially-designed board to communicate with IC cards and a oscilloscope to measure power consumption of an attacked device are too expensive and not easily available. However, recent researches show that networked computers or smart phones can be attacked with cheap and readily available equipment. Figure 1 shows experimental setup that is used to mount a SCA on a laptop [4]. The radio receiver is placed near the target and the its output is connected to the input of smart phone recording the signal. Especially, it is demonstrated that an attacker can reveal the secret keys from OpenSSL and CoreBitcoin running on Android and iOS respectively [4]. In addition to IT systems, sensors and actuators are shown to be vulnerable to SCA. For example, non-invasive attacks on Anti-lock Breaking Systems are presented in [16].

The major goal of countermeasures against power analysis is to protect the confidentiality of secret key in such a grey box environment. Although hardware-based countermeasures tend to provide faster and more secure operations than software-based implementations, they are not flexible and too expensive to be used in low-cost devices. For this reason, software-based countermeasures will find more and more applications in smartcards and other security-critical environments as well.

Fig. 1.

The radio-based SCA attacking setup in [4]. The radio receiver is placed near the victim’s laptop. The radio’s output is connected to the input of an attacker’s smart phone.

1.2. Related work

Countermeasures against SCA are varying from simple to complicated ones and software or hardware based. To the best of our knowledge, we know that:

Simple countermeasure can be easily avoided by an attacker.

More complicated countermeasures can not be applied due to the consequent performance degradation.

Specially designed hardware is effective against an attack, but it is not flexible and, in addition, expensive.

We will now explore the countermeasures up to present and discuss their limitations.

The most simple countermeasure against SCA is putting a noise to power trace. For example, inserting dummy instructions or randomization of the instruction execution sequence may give trouble to an attacker to align the power traces. However, effect of these simple countermeasures can be reduced by various signal processing technique or more power traces [19].

The basic idea of hardware-based countermeasures is to design logic cells with a power consumption that is independent of the data they process. The aim of dual-rail with precharge logic (DPL) [17] is to hide the internal circuit’s activity from an attacker. The protocol of the DPL consists of two phases: precharge and evaluation. The precharge phase admits to start new computations from a known electrical state. A binary datum is conveyed by two wires for each state: precharge = (0, 0), logical 0 = (0, 1) and logical 1 = (1, 0) in DPL. Every evaluation consists of the transition of exactly one wire ((0, 0) → (0, 1) or (0, 0) → (1, 0)). However, this increased security of DPL comes at a price. Each step of computation takes 2 clock cycles which are precharge and legal state. Moreover, the surface of DPL operator is significantly larger than its single-rail.

In the case of software countermeasures, it is very common to randomize the sensitive variables by masking techniques when a countermeasure is used to protect implementations of block ciphers against SCA. One or several random values are added to the secret data during the execution of cryptographic algorithms thereby every intermediate value is independent of any secret variable. Let’s take a close look at the masking technique. For a key-dependent intermediate byte x and a random mask m, masking requires a function $f (x, m) = x \cdot m$ , where · is defined as bitwise XOR (Boolean masking), modulo addition (additive masking) or multiplication (multiplicative masking). Below is the pseudo code of the masking scheme of the SubBytes step in AES. $\begin{array}{rcl} y \oplus m^{'} = MaskedSbox (x \oplus m \oplus k) \\ m, m^{'} : random values(mask), y = Sbox (x \oplus k) \end{array}$

However, using only one mask which is called a first-order masking is vulnerable to a second-order DPA. Specifically, we know that $\begin{array}{l} (1) & \begin{matrix} y_{1} \oplus m^{'} = MaskedSbox (x_{1} \oplus m \oplus k_{1}), \\ y_{1} = Sbox (x_{1} \oplus k_{1}) \\ y_{2} \oplus m^{'} = MaskedSbox (x_{2} \oplus m \oplus k_{2}), \\ y_{2} = Sbox (x_{2} \oplus k_{2}) . \end{matrix} \\ (2) & y_{1} \oplus m^{'} \oplus y_{2} \oplus m^{'} = y_{1} \oplus y_{2} = Sbox (x_{1} \oplus k_{1}) \oplus Sbox (x_{2} \oplus k_{2}) . \end{array}$

To be specific, Eqs (1) and (2) show that an attacker can obtain a non-masked result value of XORing two S-box outputs by XORing two masked S-box outputs. This is due to the fact that $m^{'}$ is canceled out by the XOR operation. A second-order DPA is therefore started by making two target points of a power trace as one point using subtraction or multiplication. The next step is to mount DPA based on the hypothetical value computed by XORing two S-box outputs [11,14].

Protection of second-order DPA requires more than two masks, and all intermediate values have to be masked throughout the execution of the algorithm. In particular, each of the input and output bytes of S-box must use different masks. For this reason, a masked AES implementation requires 16 masked S-boxes. As a result, a high-order masking of AES is focused on the efficient implementation of S-boxes. Unfortunately, Table 1 shows that implementing a high-order masking scheme affects the performance of AES. To be more precise, the countermeasures are 150–300 times slower than a straightforward implementation. This might be an intolerable performance for a practical solution.

Table 1
Performance of the high-order masking scheme in AES

Method Cycles RAM (bytes) ROM (bytes)

Unprotected implementation

No masking [3] $2 \times 10^{3}$ 32 1150

Provably secure second-order SCA resistant implementation

[12] $675.4 \times 10^{3}$ 0 768

[7] $265.5 \times 10^{3}$ 0 816

Method	Cycles	RAM (bytes)	ROM (bytes)
Unprotected implementation
No masking [3]	$2 \times 10^{3}$	32	1150
Provably secure second-order SCA resistant implementation
[12]	$675.4 \times 10^{3}$	0	768
[7]	$265.5 \times 10^{3}$	0	816

In addition, some cryptographic algorithms include both Boolean and arithmetic operations. To properly apply data masking, it is necessary to use a secure Boolean-from/to-arithmetic mask conversion without exposing non-masked intermediate values against a second-order DPA. Goubin proposed secure mask conversion which can hide sensitive intermediate in the conversion process [5]. However, this conversion can only resist the first-order DPA. In [18], a new conversion method was proposed, which can work for the second-order DPA. [18]’s method requires $4 \times 2^{k} + 3$ operations for the conversion of a k-bit mask. An arithmetic-to-Boolean conversion also requires a similar number of operations. It might be critical overhead when it is applied to all mask conversions.

1.3. Our contribution

In this paper, we propose a family of grey box secure block ciphers. Based on the basic idea presented at WISA 2014 [8] we give an analysis of the idea and an experimental result to justify the proposal. Our scheme is designed to satisfy the following properties:

Grey box security using table-based implementation. Our scheme generates lookup tables in advance which combine all the functions and operations in the cryptographic algorithm with encoding and decoding (we call them linear and non-linear encoder). The algorithm will be implemented by using these lookup tables. Actually, this method is similar to white-box cryptography [2] because of encoded lookup tables. However, our method dynamically inputs the round keys while the round keys of white box cryptography is tightly coupled with lookup tables. Thus, it is possible to easily change the round keys depending on the environment and an attacker cannot predict all the intermediate values during the execution of cryptographic algorithms.

No re-generation of the lookup tables. In contrast to masking techniques, our scheme does not requires re-generation of lookup tables or precomputation for each execution of encryption or decryption. It is important to notice that the overhead of masking techniques is largely due to the precomputation of randomly masked lookup tables for each execution. The precomputation in our scheme is only for the generation of encoded lookup tables.

Fast execution of the cryptographic operation. Unlike the white box implementation, the round keys are not involved in the generation of lookup tables. This fact gives us that we do not need to manage a large size of lookup tables unlike the white box implementation. Due to the reduced number of table lookups the performance result shows that our HIGHT implementation consumes only 1.79 times of the execution time compared to the straightforward implementation. In addition, we provide performance analysis of an AES implementation using our method in Appendix.

The remainder of this paper is organized as follows. Section 2 describes the existing countermeasures of SCA and the HIGHT cryptographic algorithm. In Section 3, we introduce the concept of table encoding and explain the implementation method of HIGHT algorithm to apply table encoding. We show the analysis of the security and performance of our method in Section 4 and provide experimental results of real side channel attacks against our implementation in Section 5. Finally, we offer the conclusion in Section 6.

2. HIGHT algorithm

The HIGHT (HIGh security and light weigHT) [6] is a symmetric cipher, which encrypts and decrypts data with a 64-bit block cipher using a key of size 128 bits. It provides light-weight and low-power hardware implementation for ubiquitous computing devices. We will briefly introduce the HIGHT algorithm. The 64-bit plaintext and ciphertext are denoted by concatenations of 8 bytes such as $P = P_{7} ‖ P_{6} ‖ P_{5} ‖ P_{4} ‖ P_{3} ‖ P_{2} ‖ P_{1} ‖ P_{0}$ and $C = C_{7} ‖ C_{6} ‖ C_{5} ‖ C_{4} ‖ C_{3} ‖ C_{2} ‖ C_{1} ‖ C_{0}$ . Round functions consist of several mathematical operations: ⊞ addition mod $2^{8}$ , ⊟ subtraction mod $2^{8}$ , ⊕ XOR, and $⋘ r$ r-bit left rotation. The encryption the of HIGHT algorithm is made up of initial transformation, round function, and final transformation, as described in detail Algorithm 1.

Algorithm 1

HIGHT encryption

$W K_{0 ⩽ i ⩽ 7}$ means a whitening key and $S K_{0 ⩽ i ⩽ 127}$ is a subkey. Round function uses functions $F_{0}$ and $F_{1}$ : $\begin{array}{rcl} F_{0} = (x ⋘ 1) \oplus (x ⋘ 2) \oplus (x ⋘ 7) \\ F_{1} = (x ⋘ 3) \oplus (x ⋘ 4) \oplus (x ⋘ 6) \end{array}$ The decryption process is similar to the encryption of HIGHT.

3. Proposed scheme

3.1. Key idea behind

Our scheme is inspired by a white-box implementation [2] of block ciphers. Protection of a key-customized encryption function $E_{k}$ in a white-box implementation is replaced by $E_{k}^{'} = A \cdot E_{k} \cdot B^{- 1}$ , where A and B are input and output encoding, respectively. Being chosen randomly without reference to k, the use of A and B unlikely weakens the ordinary black-box security of $E_{k}$ . However, one serious problem of this solution is the large size of the lookup table. Our motivation in this matter is that an attacker in a grey-box model does not have access to the lookup table. For this reason, we try to generate a dynamic-key lookup table which takes both a key and an operand as inputs. To be specific, it can be represented by $\begin{matrix} E (k, x) = A (E (k, B^{- 1} (x))) where x is an operand to be involved with k . \end{matrix}$

By generating a lookup table for $E (k, x)$ , we can significantly reduce the total size of the lookup table compared to a white-box implementation because the table can be shared throughout all rounds. Also, this yields an additional advantage over a white-box lookup table: it can support dynamic key application. In other words, this method can also be used when a secret key is updated from time to time, as in the case of a session key. A potential problem is how to design the lookup table within practical size because a key is added to an input to the table. In the following, we explain how to apply our scheme to HIGHT in such an efficient way.

3.2. Applying table encoding to HIGHT algorithm

It is necessary to conceal all of the intermediate values. Our scheme uses non-linear, linear encoding and additive encoding. Chow et al. [2] proposed input and output encodings to protect a table. An encoding is a bijection. Encodings are networked with the input and output of tables. A table T is prevented with chosen bijections G, H $\begin{matrix} T^{'} = H \circ T \circ G^{- 1} \end{matrix}$ Here, G is the input encoding, and H is the output encoding. In the case of two tables for lookup operations, it is expressed in a networked fashion. For example, tables $T_{1}$ and $T_{2}$ are protected with encodings as follows. $\begin{array}{rcl} T_{2}^{'} \circ T_{1}^{'} & = & (H \circ T_{2} \circ G^{- 1}) \circ (G \circ T_{1} \circ H^{- 1}) \\ = & H \circ T_{2} \circ T_{1} \circ H^{- 1} \end{array}$ Encodings obfuscate all lookup tables in our shceme. Furthermore, linear encoders L and M are used to achieve diffusion for security, defined by Shannon [15]. There is also a additive encoder to conceal $2 \times 4$ -bit output values. The additive encoder is used to encode the modular addition in a round. $\begin{array}{rcl} G, H, G^{- 1}, H^{- 1} & : & {0, 1}^{4} \to {0, 1}^{4} \\ 4-bit non-linear encoder \\ L, M, L^{- 1}, M^{- 1} & : & {0, 1}^{8} \to {0, 1}^{8} \\ 8-bit linear encoder \\ γ_{1}, γ_{2} & : & (0 ⩽ γ_{1} ⩽ 255, 0 ⩽ γ_{2} ⩽ 255) \\ 8-bit additive encoder \end{array}$

Several types of lookup tables (see Figs 3, 4, 5) could be generated with above encoders.

The modular addition and XOR operation result in a value with two operands. If two input values are 8-bit, it can be shown that all of the $2^{16} (= 65536)$ possible output values produce distinct lookup tables. However, it is too big to store in memory. To overcome this problem, it could be transformed into $2 \times 2^{12} (= 8192)$ lookup tables, which could significantly reduce the size of lookup tables which are shown in Fig. 2. At first, an 8-bit operand and a high 4-bit of another operand will become input values of the first lookup table. An 8-bit output of the first table and a low 4-bit of another operand produce an 8-bit result value of the modular addition or XOR operation by using the second table. For example, the XOR operation of two 8-bit operands can be computed with Type III-1 and III-2 tables. There are also two tables of Type IV-1 and IV-2 for the modular addition.

Fig. 2.

Reduction of table size. A solid line represent one bit. Table T ( $2^{16}$ bytes size) could be transformed into table $T_{1}$ ( $2^{12}$ bytes) and $T_{2}$ ( $2^{12}$ bytes). $Y_{h}$ and $Y_{l}$ refer to a high 4-bit and a low 4-bit of Y, respectively.

The table encoding is applied to the HIGHT algorithm. It is necessary to make 12 lookup tables of 5 types for the HIGHT algorithm. Figure 6 shows how to process the underlying a part of round function with lookup tables. $\begin{array}{rcl} X_{i + 1, 2} = X_{i, 1} ⊞ (F_{1} (X_{i, 0}) \oplus S K_{4 i + 2}) . \end{array}$

Fig. 3.

Tables of Type I.

Fig. 4.

Tables of Type II and Type III.

Fig. 5.

Tables of Type IV and Type V.

Subkey $S K_{4 i + 2}$ is protected by encoding through a Type I-1 table. Type II-1 and III-2 tables operate functions $F_{1}$ and XOR. A high 4-bit of the encoded subkey and the 8-bit value $X_{i, 0}$ are the input value of the Type II-1 table. Then, the output of Type II-1 and a low 4-bit value of the encoded subkey go into the Type III-2 table. Type IV-1 and IV-2 tables operate modular addition. The 8-bit value of $X_{i 1, 1}$ and a high 4-bit of the XORed value are the input value of Type IV-1 table. The intermediate value $X_{i + 1, 2}$ is obtained by a Type IV-2 table which is composed of an output of the Type IV-1 and a low 4-bit of the XORed value.

Fig. 6.

Processing a part of round with lookup tables.

4. Security and performance analysis

4.1. Security analysis

To demonstrate the security of the proposed method against side channel attacks, we mainly show that a randomized intermediate value is independent from a non-encoded intermediate value. To do this, we first compare each bit of encoded and non-encoded intermediate values using the proposed scheme and the original HIGHT implementations, respectively. The target intermediate value to be compared is $X_{2}$ (third byte, see Algorithm 1) in the first round output because it is affected by the first byte of the first round key. The main step of single-bit DPA is to compute a differential trace after dividing power traces into two sets according the value of a target bit. The protection of DPA can be then justified if two bits of the non-encoded and the encoded $X_{2}$ at each bit position are different with a probability 1/2. For the verification, we have performed encryption for 10,000,000 different plaintexts using the two HIGHT implementations, and we compared each bit of the encoded and the non-encoded values of $X_{2}$ . As a result, Table 2 shows that they are different with a probability close to 1/2 for every bit position. This property prevents a DPA attacker from constructing the correct sets of power traces; thus DPA is unlikely to work when our scheme used.

Table 2
Probability of different bit between table encoded and non-encoded intermediate

Bit position 1 2 3 4 5 6 7 8

Probability 49.99% 50.00% 50.01% 50.00% 50.01% 49.97% 50.00% 49.99%

Bit position	1	2	3	4	5	6	7	8
Probability	49.99%	50.00%	50.01%	50.00%	50.01%	49.97%	50.00%	49.99%

Table 3

Probability distribution for the Hamming weight of a uniformly distributed 8-bit value [10]

HW	0	1	2	3	4	5	6	7	8
Prob.	0.004	0.031	0.109	0.219	0.273	0.219	0.109	0.031	0.004

In the case of correlation power analysis (CPA), an attacker computes a correlation value between the Hamming weights of a hypothetical value and the power consumption [1]. This is due to the fact that the power consumption of a micro-controller at a given point is known to be proportional or inversely proportional to the Hamming weight of a processed data. To demonstrate the protection of CPA, we show that the Hamming weights of a encoded and a non-encoded values of $X_{2}$ are independent of each other. Let ${HW}_{α}$ denote the set of plaintexts that lead to the Hamming weight α of the non-encoded value of $X_{2}$ . Then, we have $α \in [0, 8]$ because there are nine possible Hamming weights for an 8-bit value. We performed encryption for 10,000,000 random plaintexts using the original HIGHT implementation and divided the plaintexts into ${HW}_{α}$ , where $α \in [0, 8]$ . The next step is to show that the plaintexts in ${HW}_{α}$ lead to well-distributed Hamming weights of $X_{2}$ in our implementation. For this purpose, we repeated encryption using our proposed implementation for each set of plaintexts in ${HW}_{α}$ , where $α \in [0, 8]$ . If the Hamming weights of the encoded values of $X_{2}$ are uniformly distributed, they will show the probabilities for the Hamming weights of an 8-bit value shown in Table 3. For α ∈ [0,8], our experimental results shown in Table 4 demonstrate that the plaintexts in ${HW}_{α}$ caused the Hamming weights of $X_{2}$ to be almost uniformly distributed in our implementation. This means that a encoded value and a non-encoded value did not correlate with each other with overwhelming probability. We can therefore conclude that our table encoding scheme can also protect against CPA.

Table 4

Probability distribution for the Hamming weight of a table encoded value

${HW}_{α}$	Encoded HW

	0	1	2	3	4	5	6	7	8
${HW}_{0}$	0.0038	0.0307	0.1062	0.2201	0.2773	0.2196	0.1073	0.0313	0.0038
${HW}_{1}$	0.0038	0.0311	0.1100	0.2189	0.2730	0.2197	0.1093	0.0303	0.0039
${HW}_{2}$	0.0040	0.0312	0.1092	0.2185	0.2738	0.2190	0.1092	0.0312	0.0039
${HW}_{3}$	0.0039	0.0312	0.1097	0.2189	0.2730	0.2186	0.1095	0.0313	0.0038
${HW}_{4}$	0.0040	0.0312	0.1094	0.2186	0.2736	0.2189	0.1092	0.0312	0.0039
${HW}_{5}$	0.0039	0.0312	0.1091	0.2188	0.2736	0.2189	0.1094	0.0312	0.0039
${HW}_{6}$	0.0039	0.0312	0.1090	0.2191	0.2735	0.2189	0.1093	0.0312	0.0039
${HW}_{7}$	0.0038	0.0311	0.1089	0.2183	0.2746	0.2187	0.1088	0.0317	0.0039
${HW}_{8}$	0.0040	0.0314	0.1097	0.2150	0.2744	0.2189	0.1121	0.0306	0.0039

4.2. Performance analysis

In this section, we compare the performance of the data masking and our scheme. There has been no secure implementation of HIGHT with second-order masking so far. Therefore, we estimated the approximate overhead by calculating the number of additional operations required. When the conversion algorithm of [18] is used, Boolean and arithmetic mask conversions are needed 10 times for one round function. Also, initial and final transformations are both required two times mask conversion.

If data masking is applied at the beginning and end of 2 set of 4 round, for a total of 8 rounds, initial and final transformation, the required operations of mask conversion are 86,268 $(((8 \times 10) + (2 \times 2)) \times 1027)$ because one mask conversion needs 1,027 additional operations. In the case of no masking in HIGHT, 392 operations are required because initial and final transformation need 4 operations in each and one round needs 12 operations where the HIGHT is composed of 32 rounds. Thus, it can be estimated that the data masking version is over 200 times slower than the straightforward version. Even this is an optimistic estimate as it excludes random number creation for mask conversion.

Applying table encoding to HIGHT requires 16 table lookups for initial transformation, 20 for final transformation and 20 for each round. For 8 rounds table encoding, table lookups will be 196 times. Since remaining 24 non-encoded rounds require 288 operations, total operations for table encoding are carried out 484 times. Although this means table encoding is 1.2 times slower than original HIGHT, the actual runtime should be longer than the expectation duration because memory operation takes longer than ALU operation in the CPU.

We implement the table encoded HIGHT in C using a Intel core i7. Table 5 shows that lookup tables are around 25 kbytes and it takes 1.79 times longer than original HIGHT.

Table 5
Lookup table size and time complexity of table encoded HIGHT

Size of lookup tables Time complexity

Type I 4 tables $4 \times 256$ HIGHT (non-encoding) 754 cycles

Type II 2 tables $2 \times 4096$

Type III 2 tables $2 \times 4096$ HIGHT (table encoding) 1351 cycles

Type IV 2 tables $2 \times 4096$

Type V 2 tables $2 \times 256$

Total 26,112 bytes (25.5 kbytes) Ratio 1.79 times

Size of lookup tables	Time complexity
Type I	4 tables	$4 \times 256$	HIGHT (non-encoding)	754 cycles
Type II	2 tables	$2 \times 4096$
Type III	2 tables	$2 \times 4096$	HIGHT (table encoding)	1351 cycles
Type IV	2 tables	$2 \times 4096$
Type V	2 tables	$2 \times 256$
Total	26,112 bytes (25.5 kbytes)	Ratio	1.79 times

5. Experimental result

5.1. Experimental setup

Implementation of HIGHT encryption was executed on an 8-bit 16 MHz Atmega128 microcontroller and it was attacked by side channel analysis resistant framework (SCARF) system that verified the resistance to power analysis of cryptographic algorithm [13]. Figure 7 shows the experimental environment, which consists of SCARF, Atmega128 and an oscilloscope LeCroy WaveRunner 104Xi-A. SCARF sends an application protocol data unit (APDU) including a command and a plaintext to Atmega128, collected power traces and conducted power analysis. Atmeg128 receives a plaintext from SCARF, encrypts it, and sends the result back to the SCARF. During the time the encryption is performed, power consumption of the microcontroller was measured by the oscilloscope. Because the recorded voltage drop is proportional to the power consumption of the attacked device, we used the voltage drop as power consumption and the corresponding trace as the power trace. Figure 8 shows such a recorded voltage drop of the microcontroller while it performed a straightforward HIGHT encryption.

Fig. 7.

Experimental setup.

Fig. 8.

The voltage drop (power consumption) of HIGHT encryption.

5.2. Attacks on HIGHT implementation

In this experiment, we analyze how the power consumption of HIGHT encryption depends on the hamming weight (HW) of first round output. Since each byte of a round key is attacked separately, an attacker can guess 256 possible key candidates. Initially, an attacker may guess one byte of round key $k = 0$ . Based on this guess, v (HW of the first round output) is calculated for each encryption n (for example $n = 100$ ). Afterwards, an attacker calculates an estimate of the correlation factor between v (the HW of hypothetical intermediate value) and the power consumption at each time step with in the power traces. The same procedure is repeated for the other 255 key guesses, and in this way, an attacker obtains 256 correlation plots. There is one plot for each key hypothesis. A CPA attack with the HW model is based on the fact that the power consumption is proportional or inversely proportional to the number of bits that are set in the processed data value. Therefore, an attacker can conclude that the key hypothesis that produces the highest peak among the correlation plots is right.

We have recorded the power consumption of our experimental board at 250MS/s using a LeCroy Wave Runner 104Xi-A and compressed each 10 point into one point for signal processing. We collected 100 power traces with non-encoding HIGHT and performed a CPA attack with the HW model using the SCARF. Figure 9 shows the results of a CPA attack for the $X_{2}$ output of the first round. The correct key hypothesis (plotted in black) has led to the highest peak with a correlation coefficient 0.89 while the others (plotted in grey) had correlation coefficient under 0.69. Figure 10 shows a tendency graph of the correlation coefficients for each key candidate. The correct key had the top value after approximately 13 traces were analyzed.

Fig. 9.

Result of the CPA attacks on the non-encoding HIGHT when using the HW model. The y-axes show the correlation and the x-axes show the points of the power trace. Black line: correct key hypothesis, grey line: wrong key hypothesis. The correct key makes the highest peaks in the graph.

Fig. 10.

Tendency graph of correlation coefficient values for each key candidate when attacking non-encoding HIGHT. The correct key held the highest value since 13 traces are analyzed.

Fig. 11.

Result of the CPA attacks on the table encoded HIGHT when using the HW model. The wrong key (grey line) makes the highest peak in the graph.

Fig. 12.

Tendency graph of correlation coefficient values for each key candidate when attacking table encoded HIGHT. The correct key (black line) never held the top value in the whole analysis.

For more rigid verifications compared to the straightforward HIGHT, we collected 50,000 power traces at 250MS/s for table encoded HIGHT. Figure 11 shows the correlation plots, and Fig. 12 is the tendency graph of the CPA attack on the table encoding version. The highest peak of the wrong key had the correlation coefficient of 0.51 while the correct key has 0.34. Furthermore, the correct key was not included the top 10 lists of the key candidates (the SCARF provides the top 10 list). As shown in Fig. 12, the correct key was never at top in the whole traces analysis.

6. Conclusion

Prior works have documented masking methods against standard DPA attacks. However, first-order masking is vulnerable to higher order DPA attacks since the attacks use the correlation coefficient between two or more points. Although the higher order masking schemes have been proposed but they are not easy to implement in practice due to the poor performance. In this study, we demonstrated that it is possible to implement protected block ciphers with only a little overhead. As a result, our scheme takes only 1.79 times longer than the straightforward HIGHT implementation, which almost 200 times less than the second-order masking method. This means that it is possible to implement the table encoded HIGHT implementation on a microprocessor that is secure against SCA by using 25 KB memory. In addition, we provide the result of table encoded AES implementation which show that our scheme can be applied other block cipher in Appendix.

Future work will focus on the reduction of table size. While our scheme requires much smaller tables than white-box cryptography, several kilobytes size table can not be accepted in some circumstances. It will be possible to reduce the size of tables if we slow down the speed of an algorithm. However, security strength should be analyzed for table reduction.

Footnotes

Acknowledgements

This work was supported by the K-SCARF project, the ICT R&D program of ETRI (Research on Key Leakage Analysis and Response Technologies).

Experimental result for AES algorithm

We implemented the table encoded AES in C and tested using Intel core i7. Table 6 shows that lookup tables were 10 kbytes and it took 1.16 times longer than non-encoded AES. We collected 50,000 power traces at 250MS/s for table encoded AES. Figure 13 shows the correlation plots and Fig. 14 shows the tendency graph of the CPA attack on the table encoded AES.

References

Brier,

Clavier and

Olivier, Correlation power analysis with a leakage model, in: Cryptographic Hardware and Embedded Systems – CHES 2004, Springer, Berlin/Heidelberg, 2004, pp. 16–29. doi:10.1007/978-3-540-28632-5_2.

Chow

et al., White-box cryptography and an AES implementation, in: Selected Areas in Cryptography, Springer, Berlin/Heidelberg, 2002.

Fumaroli

et al., Affine Masking Against Higher-Order Side Channel Analysis, Selected Areas in Cryptography, Springer, Berlin/Heidelberg, 2010.

Genkin

et al., Stealing keys from PCs using a radio: Cheap electromagnetic attacks on windowed exponentiation, in: Cryptographic Hardware and Embedded Systems – CHES 2015, Springer, Berlin/Heidelberg, 2015, pp. 207–228. doi:10.1007/978-3-662-48324-4_11.

Goubin, A sound method for switching between Boolean and arithmetic masking, in: Cryptographic Hardware and Embedded Systems – CHES 2001, Springer, Berlin/Heidelberg, 2001.

Hong

et al., HIGHT: A new block cipher suitable for low-resource device, in: Cryptographic Hardware and Embedded Systems – CHES 2006, Springer, Berlin/Heidelberg, 2006, pp. 46–59. doi:10.1007/11894063_4.

Kim,

Hong and

Lim, A fast and provably secure higher-order masking of AES S-box, in: Cryptographic Hardware and Embedded Systems – CHES 2011, Springer, Berlin/Heidelberg, 2011, pp. 95–107. doi:10.1007/978-3-642-23951-9_7.

Kim

et al., Function masking: A new countermeasure against side channel attack, in: Information Security Applications, Springer International Publishing, 2014, pp. 331–342.

Kocher,

Jaffe and

Jun, Differential Power Analysis, Advances in Cryptology – CRYPTO’99, Springer, Berlin/Heidelberg, 1999.

10.

Mangard,

Oswald and

Popp, Power Analysis Attacks: Revealing the Secrets of Smart Cards, Vol. 31, Springer Science & Business, Media, 2008.

11.

Oswald

et al., Practical second-order DPA attacks for masked smart card implementations of block ciphers, in: Topics in Cryptology-CT-RSA 2006, Springer, Berlin/Heidelberg, 2006, pp. 192–207. doi:10.1007/11605805_13.

12.

Rivain and

Prouff, Provably secure higher-order masking of AES, in: Cryptographic Hardware and Embedded Systems – CHES 2010, Springer, Berlin/Heidelberg, 2010, pp. 413–427. doi:10.1007/978-3-642-15031-9_28.

13.

SCARF: Side channel analysis resistant framework, (online) available from http://www.k-scarf.or.kr.

14.

Schramm and

Paar, Higher order masking of the AES, in: Topics in Cryptology-CT-RSA 2006, Springer, Berlin/Heidelberg, 2006, pp. 208–225. doi:10.1007/11605805_14.

15.

C.E.

Shannon, Communication theory of secrecy systems, Bell System Technical Journal 28(4) (1949), 656–715. doi:10.1002/j.1538-7305.1949.tb00928.x.

16.

Shoukry

et al., Non-invasive spoofing attacks for anti-lock braking systems, in: Cryptographic Hardware and Embedded Systems – CHES 2013, Springer, Berlin/Heidelberg, 2013, pp. 55–72. doi:10.1007/978-3-642-40349-1_4.

17.

Tiri and

Verbauwhede, A logic level design methodology for a secure DPA resistant ASIC or FPGA implementation, in: Proceedings of the Conference on Design, Automation and Test in Europe, Vol. 1, IEEE Computer Society, 2004.

18.

Vadnala ,

Kumar and

Großchadl, Algorithms for switching between Boolean and arithmetic masking of second order, in: Security, Privacy, and Applied Cryptography Engineering, Springer, Berlin/Heidelberg, 2013, pp. 95–110. doi:10.1007/978-3-642-41224-0_8.

19.

Zhou and

Feng, Side-channel attacks: Ten years after its publication and the impacts on cryptographic module security testing, IACR Cryptology EPrint Archive 2005 (2005), 388.