Abstract
Nowadays, secret keys of networked devices are profoundly attacked by power analysis attacks, caused by the dramatic evolution of statistical analysis with a simple experimental setup. Recently, OpenSSL and CoreBitcoin running on Android and iOS have been broken by power analysis. Moreover, sensors and actuators can also be attacked thereby threatening user’s privacy and security. To resolve these challenges, power-analysis-resistant implementations of cryptographic algorithms in networked devices have received a lot of attentions. Masking schemes have been developed to implement secure cryptographic algorithms against side-channel analysis (SCA) attacks. Technically, the first-order masking method is vulnerable to the second order differential power analysis (2ODPA) attacks, but the current solutions against 2ODPA are expensive to be implemented. Moreover, worse performance will be shown if the cryptographic algorithms include boolean and arithmetic operations. In this paper, we propose a new countermeasure scheme to resist SCA attacks. Our scheme randomizes all the intermediate values of block cipher by encoding functions in the algorithm to lookup table and makes it resistant to power analysis attack. We apply our scheme to the block cipher algorithm, HIGHT. Our protected implementation of HIGHT takes only 1.79 times compared to the straightforward algorithm, and it needs 25 kbytes to store lookup tables in memory.
Introduction
Background
In the black box model which is a class attack type in the field of symmetric-key cryptography, an attacker is assumed to be able to access only inputs and outputs of the cipher with known- or chosen-plaintexts or ciphertexts. After Kocher et al. [9] introduced simple power analysis (SPA) and differential power analysis (DPA), a lot of researches have been published about various methods to protect against these power analysis attacks. SPA, as one of the side-channel analysis (SCA) attacks, involves a visual examination of graphs of the current used by a device over time. Variations in power consumption occur that the device performs different operations. Just one power trace is enough to launch an attack, provided that an attacker knows about the wave patterns of target operations. A DPA attacker computes the intermediate values of the target operation and statistically analyzes power consumption measurements from an attacked device thereby does not require detailed information about the device. Sometimes, SCA is classified as a grey box model because an attacker has more information compared to the black box model. Generally speaking, SCA is a physical attack to find out secret data by using leaked information, such as power consumption, electromagnetic wave, or timing, which are obtained through the physical implementation of a cryptosystem. In this sense, SCA includes not only power analysis but also fault attacks or electromagnetic analysis. Here we focus on power analysis, where an attack uses the statistical dependency between the intermediate values and the leaked information to determine the secret key related to the attacked intermediate values.1
A preliminary version of this paper appeared in World Conference on Information Security Applications 2014, August 25–27, Jeju, Korea. This version includes a concrete analysis and supporting experimental results.
Previously, SCA has been considered too costly due to its expensive experimental setup. In particular, a specially-designed board to communicate with IC cards and a oscilloscope to measure power consumption of an attacked device are too expensive and not easily available. However, recent researches show that networked computers or smart phones can be attacked with cheap and readily available equipment. Figure 1 shows experimental setup that is used to mount a SCA on a laptop [4]. The radio receiver is placed near the target and the its output is connected to the input of smart phone recording the signal. Especially, it is demonstrated that an attacker can reveal the secret keys from OpenSSL and CoreBitcoin running on Android and iOS respectively [4]. In addition to IT systems, sensors and actuators are shown to be vulnerable to SCA. For example, non-invasive attacks on Anti-lock Breaking Systems are presented in [16].
The major goal of countermeasures against power analysis is to protect the confidentiality of secret key in such a grey box environment. Although hardware-based countermeasures tend to provide faster and more secure operations than software-based implementations, they are not flexible and too expensive to be used in low-cost devices. For this reason, software-based countermeasures will find more and more applications in smartcards and other security-critical environments as well.

The radio-based SCA attacking setup in [4]. The radio receiver is placed near the victim’s laptop. The radio’s output is connected to the input of an attacker’s smart phone.
Countermeasures against SCA are varying from simple to complicated ones and software or hardware based. To the best of our knowledge, we know that:
Simple countermeasure can be easily avoided by an attacker. More complicated countermeasures can not be applied due to the consequent performance degradation. Specially designed hardware is effective against an attack, but it is not flexible and, in addition, expensive.
We will now explore the countermeasures up to present and discuss their limitations.
The most simple countermeasure against SCA is putting a noise to power trace. For example, inserting dummy instructions or randomization of the instruction execution sequence may give trouble to an attacker to align the power traces. However, effect of these simple countermeasures can be reduced by various signal processing technique or more power traces [19].
The basic idea of hardware-based countermeasures is to design logic cells with a power consumption that is independent of the data they process. The aim of dual-rail with precharge logic (DPL) [17] is to hide the internal circuit’s activity from an attacker. The protocol of the DPL consists of two phases: precharge and evaluation. The precharge phase admits to start new computations from a known electrical state. A binary datum is conveyed by two wires for each state: precharge = (0, 0), logical 0 = (0, 1) and logical 1 = (1, 0) in DPL. Every evaluation consists of the transition of exactly one wire ((0, 0) → (0, 1) or (0, 0) → (1, 0)). However, this increased security of DPL comes at a price. Each step of computation takes 2 clock cycles which are precharge and legal state. Moreover, the surface of DPL operator is significantly larger than its single-rail.
In the case of software countermeasures, it is very common to randomize the sensitive variables by masking techniques when a countermeasure is used to protect implementations of block ciphers against SCA. One or several random values are added to the secret data during the execution of cryptographic algorithms thereby every intermediate value is independent of any secret variable. Let’s take a close look at the masking technique. For a key-dependent intermediate byte x and a random mask m, masking requires a function
However, using only one mask which is called a first-order masking is vulnerable to a second-order DPA. Specifically, we know that
To be specific, Eqs (1) and (2) show that an attacker can obtain a non-masked result value of XORing two S-box outputs by XORing two masked S-box outputs. This is due to the fact that
Protection of second-order DPA requires more than two masks, and all intermediate values have to be masked throughout the execution of the algorithm. In particular, each of the input and output bytes of S-box must use different masks. For this reason, a masked AES implementation requires 16 masked S-boxes. As a result, a high-order masking of AES is focused on the efficient implementation of S-boxes. Unfortunately, Table 1 shows that implementing a high-order masking scheme affects the performance of AES. To be more precise, the countermeasures are 150–300 times slower than a straightforward implementation. This might be an intolerable performance for a practical solution.
Performance of the high-order masking scheme in AES
Performance of the high-order masking scheme in AES
In addition, some cryptographic algorithms include both Boolean and arithmetic operations. To properly apply data masking, it is necessary to use a secure Boolean-from/to-arithmetic mask conversion without exposing non-masked intermediate values against a second-order DPA. Goubin proposed secure mask conversion which can hide sensitive intermediate in the conversion process [5]. However, this conversion can only resist the first-order DPA. In [18], a new conversion method was proposed, which can work for the second-order DPA. [18]’s method requires
In this paper, we propose a family of grey box secure block ciphers. Based on the basic idea presented at WISA 2014 [8] we give an analysis of the idea and an experimental result to justify the proposal. Our scheme is designed to satisfy the following properties:
Grey box security using table-based implementation. Our scheme generates lookup tables in advance which combine all the functions and operations in the cryptographic algorithm with encoding and decoding (we call them linear and non-linear encoder). The algorithm will be implemented by using these lookup tables. Actually, this method is similar to white-box cryptography [2] because of encoded lookup tables. However, our method dynamically inputs the round keys while the round keys of white box cryptography is tightly coupled with lookup tables. Thus, it is possible to easily change the round keys depending on the environment and an attacker cannot predict all the intermediate values during the execution of cryptographic algorithms.
No re-generation of the lookup tables. In contrast to masking techniques, our scheme does not requires re-generation of lookup tables or precomputation for each execution of encryption or decryption. It is important to notice that the overhead of masking techniques is largely due to the precomputation of randomly masked lookup tables for each execution. The precomputation in our scheme is only for the generation of encoded lookup tables.
Fast execution of the cryptographic operation. Unlike the white box implementation, the round keys are not involved in the generation of lookup tables. This fact gives us that we do not need to manage a large size of lookup tables unlike the white box implementation. Due to the reduced number of table lookups the performance result shows that our HIGHT implementation consumes only 1.79 times of the execution time compared to the straightforward implementation. In addition, we provide performance analysis of an AES implementation using our method in Appendix.
The remainder of this paper is organized as follows. Section 2 describes the existing countermeasures of SCA and the HIGHT cryptographic algorithm. In Section 3, we introduce the concept of table encoding and explain the implementation method of HIGHT algorithm to apply table encoding. We show the analysis of the security and performance of our method in Section 4 and provide experimental results of real side channel attacks against our implementation in Section 5. Finally, we offer the conclusion in Section 6.
HIGHT algorithm
The HIGHT (HIGh security and light weigHT) [6] is a symmetric cipher, which encrypts and decrypts data with a 64-bit block cipher using a key of size 128 bits. It provides light-weight and low-power hardware implementation for ubiquitous computing devices. We will briefly introduce the HIGHT algorithm. The 64-bit plaintext and ciphertext are denoted by concatenations of 8 bytes such as

HIGHT encryption
Key idea behind
Our scheme is inspired by a white-box implementation [2] of block ciphers. Protection of a key-customized encryption function
By generating a lookup table for
Applying table encoding to HIGHT algorithm
It is necessary to conceal all of the intermediate values. Our scheme uses non-linear, linear encoding and additive encoding. Chow et al. [2] proposed input and output encodings to protect a table. An encoding is a bijection. Encodings are networked with the input and output of tables. A table T is prevented with chosen bijections G, H
Several types of lookup tables (see Figs 3, 4, 5) could be generated with above encoders.
The modular addition and XOR operation result in a value with two operands. If two input values are 8-bit, it can be shown that all of the

Reduction of table size. A solid line represent one bit. Table T (
The table encoding is applied to the HIGHT algorithm. It is necessary to make 12 lookup tables of 5 types for the HIGHT algorithm. Figure 6 shows how to process the underlying a part of round function with lookup tables.

Tables of Type I.

Tables of Type II and Type III.

Tables of Type IV and Type V.
Subkey

Processing a part of round with lookup tables.
Security analysis
To demonstrate the security of the proposed method against side channel attacks, we mainly show that a randomized intermediate value is independent from a non-encoded intermediate value. To do this, we first compare each bit of encoded and non-encoded intermediate values using the proposed scheme and the original HIGHT implementations, respectively. The target intermediate value to be compared is
Probability of different bit between table encoded and non-encoded intermediate
Probability of different bit between table encoded and non-encoded intermediate
Probability distribution for the Hamming weight of a uniformly distributed 8-bit value [10]
In the case of correlation power analysis (CPA), an attacker computes a correlation value between the Hamming weights of a hypothetical value and the power consumption [1]. This is due to the fact that the power consumption of a micro-controller at a given point is known to be proportional or inversely proportional to the Hamming weight of a processed data. To demonstrate the protection of CPA, we show that the Hamming weights of a encoded and a non-encoded values of
Probability distribution for the Hamming weight of a table encoded value
In this section, we compare the performance of the data masking and our scheme. There has been no secure implementation of HIGHT with second-order masking so far. Therefore, we estimated the approximate overhead by calculating the number of additional operations required. When the conversion algorithm of [18] is used, Boolean and arithmetic mask conversions are needed 10 times for one round function. Also, initial and final transformations are both required two times mask conversion.
If data masking is applied at the beginning and end of 2 set of 4 round, for a total of 8 rounds, initial and final transformation, the required operations of mask conversion are 86,268
Applying table encoding to HIGHT requires 16 table lookups for initial transformation, 20 for final transformation and 20 for each round. For 8 rounds table encoding, table lookups will be 196 times. Since remaining 24 non-encoded rounds require 288 operations, total operations for table encoding are carried out 484 times. Although this means table encoding is 1.2 times slower than original HIGHT, the actual runtime should be longer than the expectation duration because memory operation takes longer than ALU operation in the CPU.
We implement the table encoded HIGHT in C using a Intel core i7. Table 5 shows that lookup tables are around 25 kbytes and it takes 1.79 times longer than original HIGHT.
Lookup table size and time complexity of table encoded HIGHT
Lookup table size and time complexity of table encoded HIGHT
Experimental setup
Implementation of HIGHT encryption was executed on an 8-bit 16 MHz Atmega128 microcontroller and it was attacked by side channel analysis resistant framework (SCARF) system that verified the resistance to power analysis of cryptographic algorithm [13]. Figure 7 shows the experimental environment, which consists of SCARF, Atmega128 and an oscilloscope LeCroy WaveRunner 104Xi-A. SCARF sends an application protocol data unit (APDU) including a command and a plaintext to Atmega128, collected power traces and conducted power analysis. Atmeg128 receives a plaintext from SCARF, encrypts it, and sends the result back to the SCARF. During the time the encryption is performed, power consumption of the microcontroller was measured by the oscilloscope. Because the recorded voltage drop is proportional to the power consumption of the attacked device, we used the voltage drop as power consumption and the corresponding trace as the power trace. Figure 8 shows such a recorded voltage drop of the microcontroller while it performed a straightforward HIGHT encryption.

Experimental setup.

The voltage drop (power consumption) of HIGHT encryption.
In this experiment, we analyze how the power consumption of HIGHT encryption depends on the hamming weight (HW) of first round output. Since each byte of a round key is attacked separately, an attacker can guess 256 possible key candidates. Initially, an attacker may guess one byte of round key
We have recorded the power consumption of our experimental board at 250MS/s using a LeCroy Wave Runner 104Xi-A and compressed each 10 point into one point for signal processing. We collected 100 power traces with non-encoding HIGHT and performed a CPA attack with the HW model using the SCARF. Figure 9 shows the results of a CPA attack for the

Result of the CPA attacks on the non-encoding HIGHT when using the HW model. The y-axes show the correlation and the x-axes show the points of the power trace. Black line: correct key hypothesis, grey line: wrong key hypothesis. The correct key makes the highest peaks in the graph.

Tendency graph of correlation coefficient values for each key candidate when attacking non-encoding HIGHT. The correct key held the highest value since 13 traces are analyzed.

Result of the CPA attacks on the table encoded HIGHT when using the HW model. The wrong key (grey line) makes the highest peak in the graph.

Tendency graph of correlation coefficient values for each key candidate when attacking table encoded HIGHT. The correct key (black line) never held the top value in the whole analysis.
For more rigid verifications compared to the straightforward HIGHT, we collected 50,000 power traces at 250MS/s for table encoded HIGHT. Figure 11 shows the correlation plots, and Fig. 12 is the tendency graph of the CPA attack on the table encoding version. The highest peak of the wrong key had the correlation coefficient of 0.51 while the correct key has 0.34. Furthermore, the correct key was not included the top 10 lists of the key candidates (the SCARF provides the top 10 list). As shown in Fig. 12, the correct key was never at top in the whole traces analysis.
Prior works have documented masking methods against standard DPA attacks. However, first-order masking is vulnerable to higher order DPA attacks since the attacks use the correlation coefficient between two or more points. Although the higher order masking schemes have been proposed but they are not easy to implement in practice due to the poor performance. In this study, we demonstrated that it is possible to implement protected block ciphers with only a little overhead. As a result, our scheme takes only 1.79 times longer than the straightforward HIGHT implementation, which almost 200 times less than the second-order masking method. This means that it is possible to implement the table encoded HIGHT implementation on a microprocessor that is secure against SCA by using 25 KB memory. In addition, we provide the result of table encoded AES implementation which show that our scheme can be applied other block cipher in Appendix.
Future work will focus on the reduction of table size. While our scheme requires much smaller tables than white-box cryptography, several kilobytes size table can not be accepted in some circumstances. It will be possible to reduce the size of tables if we slow down the speed of an algorithm. However, security strength should be analyzed for table reduction.
Footnotes
Acknowledgements
This work was supported by the K-SCARF project, the ICT R&D program of ETRI (Research on Key Leakage Analysis and Response Technologies).
Experimental result for AES algorithm
We implemented the table encoded AES in C and tested using Intel core i7. Table 6 shows that lookup tables were 10 kbytes and it took 1.16 times longer than non-encoded AES. We collected 50,000 power traces at 250MS/s for table encoded AES. Figure 13 shows the correlation plots and Fig. 14 shows the tendency graph of the CPA attack on the table encoded AES.
