Abstract
The in-car voice controllable system has become an almost standard feature in smart cars. Prior work shows that the voice controllable system is vulnerable to light commands attack which uses the laser as the medium to inject voice commands. In this article, we first reproduced the light commands attack on acoustic isolated in-car voice controllable system under several scenarios with a lightweight solution. We validate the feasibility of injecting the malicious voice command through a window into the microphone by modulating a laser beam. Then, we tested a variety of mainstream countermeasures such as placing sunscreen film on the glass panel to see whether it can protect the microphone from being attacked. Surprisingly, we find that the lower light transmittance of sunscreen film is the lower the success rate of the attack. Experiment results also show that when the transmittance rate of sun film is 50% which is the darkest sunscreen film that can be applied, the attacking success rate decreased by up to 0.4. We also explore the impact of attack angle by changing the incidence angle of the laser beam and the results demonstrate that light commands is sensitive to attack angle and the successful angle range is ± 15°. Finally, we propose a series of hardware-based protection schemes against light commands attacks.
Keywords
Introduction
With the rapid development of computer technology and artificial intelligence, voice controllable system (VCS) has been widely used in many platforms, such as smartphones and in-car controllable systems. In-car VCS brings great convenience to drivers, especially when both hands of the driver are occupied while driving. The National Development and Reform Commission claims that China’s share of smart cars will reach 50% by 2020. 1 But at the same time, security problems are introduced to the vehicle system, as the VCS is vulnerable to attacks.
Previous research has confirmed the fragility of VCS, such as the lack of authentication of user identity.2,3 Some studies show that the hidden or inaudible voice commands can control the VCS. But today’s cars have very strong noise isolation capabilities, and some high-end cars (such as Nissan Fuga) are even equipped with active noise cancelation function. It is almost impossible to inject sound signals directly into the microphone inside the car from outside. In order to successfully receive acoustic signals, it needs to be played at a sufficient volume outside the car, which will immediately attract the attention of the car owner and expose the attacker.
The state-of-the-art light commands attack 4 confirmed the optical coupling phenomenon of the microelectromechanical system (MEMS) microphone. After the optical power directly hitting the diaphragm exceeds a certain threshold, the analog output voltage of the microphone has a linear positive correlation with the light intensity. That is, the MEMS microphone can convert an optical signal into an electrical signal in the same way as to sound signal; thus, the malicious commands can be injected into the target microphone by modulating the intensity of the laser. Thus, the light commands attack can ignore noise isolation methods and will not be detected by an active noise cancelation system. The only thing to ensure is that there exists an effective light path from the outside of the car to the microphone, making it an effective way to attack the acoustic isolated system.
However, light commands also inherits the characteristics of the strong directivity of laser and is difficult to diffract and scatter. This requires the laser beam to be directly incident on the microphone port at a small incident angle. And the power of the laser beam will also be affected by the window glass of the vehicle and the sunscreen film on it. In this article, we reproduce and investigate light commands attacks against in-car VCS by considering this especially attack scenarios with a lightweight solution. In order to simulate a real usage scenario, we covered two common sunscreen films for windows to test the protective effect of sunscreen films with different transmittances on light-coupled attacks. Then, we tested different incident angles to determine the maximum incident angle at which a successful attack can be implemented. Finally, we put forward several protective measures. In summary, our contributions are as follows.
Reproduce and optimize light commands attack
We confirmed the feasibility of light commands attacks in the smart car scenario. We then proposed more lightweight and cheap solutions based on the original suggested equipment.
Tests for sunscreen films
After that, we tested the two commonly used sunscreen films (windshield film and window glass film) and proved that the film has almost no protective effect against light commands attacks.
Tests for angle of incidence
We tested the recognition rate at 5°, 10°, 15°, and 20° incident angles, and found that within 15°, the recognition rate was all above 99%, and at 20°, the recognition rate suddenly dropped to 0.
Suggest hardware-based defense strategies
Finally, we propose a series of hardware-based protection schemes against light commands attacks in smart car scenarios based on the characteristics of the laser.
Background
In this section, we first describe the basic conception of smart cars and then introduce the common attacks on VCS.
Smart cars
Nowadays, most of the smart cars are equipped with hundreds of sensors. These sensors not only provide comfort for their users while traveling but also guarantee their safety. Eskandarian 5 divided those smart cars into three categories according to their degree of autonomy, that is, smart cars with high autonomy are able to drive without any driver assistance, smart cars with moderate autonomy assist the driver as necessary, and smart cars with low autonomy (pure driving) completely transfer all control over the vehicle to the driver and merely warn the driver of potential errors. Activities such as the ABS and stabilization systems along with other systems and components, which constantly measure the vehicle’s condition and help provide a safe and comfortable ride, run automatically in the vehicle. 6
Attacks on smart cars
Smart cars have become part of the Internet of Things by integrating multiple sensors and connecting to the Internet. Therefore, the inherent security vulnerabilities of IoT were brought into the system of smart cars. Gašper Školc and Markelj Blaž claimed that data generated during driving are supervised, collected, and analyzed. And they pointed out that an unauthorized person may carry out the following activities.
7
Damage/loss Wiretapping/bugging/interception/hijacking MITM (man-in-the-middle) Criminal abuse Disclosure of confidential information Identity fraud Malware
Voice-controllable system
The term “VCS” refers to a system that can record, comprehend, and execute the voice commands directly spoken by users in a natural language. Therefore, a VCS system is basically composed of three main subsystems 8 : voice capture, speech recognition, and command execution. The voice capture subsystem records ambient voices, which are amplified, filtered, and digitalized, before being sent to the speech recognition subsystem. Next, the speech recognition system is responsible for interpreting the meaning of the voice command using signal and natural language processing. Finally, the command execution system manipulates physical devices, for example, servos, to take corresponding actions. With the widespread use of VCS, most smart cars have been equipped with VCSs, which allow the drivers to operate hands-free without interrupting the driving.
Acoustic signal injection attacks
A series of previous work has realized the use of acoustic signal injection as a method to manipulate multiple systems.
Specifically, Son and Shin 9 found that MEMS sensors are sensitive to ultrasound signals, which can be used to carry out denial of service attacks against the inertial measurement unit on drones. Trippel and Weisse demonstrated “Walnut” attack 10 which also uses ultrasound signals to spoof the MEMS accelerometers in smartphones. Some works11–14 also explored the stealthy way to inject commands, preventing the user from recognizing or even hearing the injected commands. Dolphin attack, which was carried out by Zhang and Yan, 8 takes advantage of the nonlinearity of the microphone. By modulating the baseband signal of the voice command on ultrasonic carriers, the acoustic signal becomes inaudible but can still be recorded after nonlinear effect and filter. In ghost talk, 15 Kuna D F and Backes J used the coupling between circuits to convert electromagnetic signals to sound signals which realized the transform from another form of signal to the voice signal.
Laser injection attacks
In addition to sound, the laser has also been used as a medium to inject malicious signals. A line of works has achieved denial of service attack on cameras and LIDARs.16–18 And this attack was later perfected by Cao Y 19 that injects modulated signals to LIDAR systems, causing the LIDAR to detect a fake object. Some later studies have revealed that various sensors, such as infrared and light sensors, can be used to transfer and spread malware between infected devices. 20 Generally, the laser causes temporary errors in semiconductors. 21 With this flaw, Skorobogatov and Anderson 22 presented the first light-induced attacks on smart cards and microcontrollers which proved that laser can cause bit flip in memory cells.
Light commands attack is a new class of signal injection attacks based on the photoacoustic effect, that is, MEMS microphone can convert light to sound. By aiming light was that amplitude-modulated with voice command at the microphone’s aperture, the microphone can output the original voice command even when no actual acoustic signal is received. Light commands attack can perform long-range attacks up to 110 m and can penetrate transparent medium such as windows. 4
MEMS microphones
Microelectromechanical system is an integrated implementation of mechanical components on a chip, which enables it to be small, cheap, and power-efficient. MEMS-based microphones are particularly popular in mobile and embedded applications (such as smartphones and smart cars) due to their small footprints and low prices. The structure of the commonly used MEMS microphone is shown in Figure 1. Structure of a microelectromechanical system microphone.
4

A typical MEMS microphone is composed of a diaphragm and an ASIC circuit. The diaphragm is a thin membrane that vibrates in response to an acoustic wave. Along with the fixed backplate, the diaphragm works as a parallel plate capacitor. The capacitance of it changes as a sequence of the diaphragm’s mechanical deformations as it responds to constantly changing sound pressure. Finally, the ASIC die converts the capacitive change to a voltage signal on the output of the microphone.
Threat model
The attacker’s goal is to inject malicious commands into the targeted smart car with extremely strong sound isolation capability, without being detected by the car owner or other. In this case, a smart car can be abstracted as a translucent rigid box with a MEMS microphone inside as shown in Figure 2. More specifically, we consider the following threat model. Smart cars are abstracted as translucent rigid boxes.
Light can penetrate the box but sound is blocked.
Complete sound isolation
According to the average noise reduction level of high-end smart cars, we assume that the car body has complete sound insulation capability to the outside world. In other words, no matter how loud the outside sound is, it cannot be heard in the car at all. Therefore, the car body can be regarded as an ideal rigid box that will not be vibrated by external sound waves; thus, the external sounds cannot penetrate the inside of the box.
Light path
We also assume that there is a direct light path from the laser source to the microphone ports. That is because the smart cars often have multiple microphones, and there is always at least one microphone port that is reachable by light through the window. Therefore, we abstracted the car body into a transparent box with MEMS inside.
Existence of translucent medium
We do notice that the vast majority of car owners will cover the window with a sunscreen film after purchasing a new car to reduce the light transmittance of the window, which will significantly affect the transmitted light intensity. Most local regulations require that the legal transmittance of sunscreen films be greater than 50%. Therefore, we assume that there is a semitransparent medium with a light transmission of at least 50% on the optical path.
Smart car characteristics
Finally, we also assume that the attacker knows the details of the target vehicle. Therefore, the best injection light path can be designed according to the structural characteristics of the vehicle and the microphone layout. Such knowledge can be easily obtained by visiting the 4S shop for investigation or browsing relevant websites before launching an attack.
Reproduce and optimize light commands
In the following, we propose a new solution to launch light commands attack with more lightweight and cheaper instruments. We then present optimizations for our test environment.
Lightweight solution
We used a SYD1230 650-nm 5 mw laser diode as our laser source and it is rated IIIa. This laser is equipped with an adjustable lens to adjust the focus. Because the output power of the laser is linearly related to the current, most of the signal sources are voltage sources. We use a diode driver board that supports analog modulation to convert the voltage to current, so as to achieve the purpose of audio signal modulation laser intensity. The laser beam was directed to the acoustic port of a MEMS microphone breakout board mounting an ADMP401 MEMS microphone. The signal was generated by ROGIL DG1022 voltage signal generator. Finally, we recorded the input audio signal and the output of the microphone using ROGIL DS1022 oscilloscope. The circuit diagram of our solution is shown in Figure 3. Circuit diagram of our solution.
We used the signal generator to modulate a sine wave on top of the input voltage of voltage-to-current converter Ut via amplitude modulation, following the equation below
Automated testing platform
We then managed to build an automated testing platform for further experiments. We use the sound card on a laptop computer (LG Gram 13Z990A) for both audio output and microphone sampling, and we use an ATTEN APS3005S-3D regulated DC power supply to offer bias for the audio input. The headphone jack of laptop computer is i-type four-pole 2.5 mm audio jack and the structure of it is shown in Figure 4. We connected the left channel of the quadrupole pin to the negative pole of the DC power supply and connected the positive pole of the DC power supply as an audio input to the converter. The circuit diagram is shown in Figure 5. In this case, VB = 0.4 V. Connect the MEMS microphone audio output pin to the MIC pin of the quadrupole plug. The output of the microphone is as follows i-type 2.5 mm male 4-pole audio jack (a). N-type (b). Circuit diagram of automated testing platform.

We use Python’s PyAudio library to create two audio streams, one for playing voice commands and one for saving the sampled audio signal. After that, audio output and microphone input should be switched to the headphone in the driver panel (i.e., SmartAudio 2.0 for Synaptic devices) of the sound card.
Finally, we use Baidu AI’s speech recognition service to convert the sampled audio into text information. We compare the returned text information with the content of the original instruction to determine whether the attack was successfully carried out.
Attack design
In this section, we elaborate on the selection of benchmarked voice commands. Then, we describe our method of voice command generation. Finally, we explained the evaluation criteria for successful attacks. The attack process is illustrated in Figure 6. Flowchart of attack process.
Command selection
Details of voice command recording.
This command is a typical query command, and it does not require any hardware operation from the VC system. The syllables of this instruction are very clear, and the recognition is relatively easy. We use it as our baseline of each testing scenario.
Voice command generation
Today’s text-to-speech (TTS) technique is very mature. Anyone can easily convert a paragraph of text into speech. Therefore, even if an attacker cannot obtain the user’s voice recording, he can use the TTS system to generate an attack instruction according to his needs. However, most VC systems are speaker-independent, which also allows even a piece of voice instructions generated by the algorithm to be recognized and executed by the VC system.
We use the integrated TTS function of Google Translate to generate the above voice commands. Each command was repeated twice and normalized to adjust the overall volume of the audio clips to a constant value. Finally, we played these clips on the laptop to the headphone jack which was connected to converter and laser diode. And the feedback of MEMS microphone was recorded simultaneously.
Success evaluation criteria
Generally speaking, after the voice command is recognized as text by ASR system, VCS will execute the command based on the recognition result. In the case that the ASR system is speaker-independent, we argue that once the voice command is correctly recognized, the VC system can be considered to be manipulated by attackers.
We play the instruction twice and then upload the microphone sampling results to the ASR service for recognition. We compare the returned text result with the original instruction. As long as there is one matching result, we consider it to be a successful attack. Then, we derive the success rate simply by calculating the percentage of successful attacks in all the attacks we carried out.
Evaluation of light commands attack on smart cars
In this section, we evaluate the performance of light commands attack in terms of sunscreen film covered on the windowpane and incidence angles using the setup shown in Figure 7.And the summary of the experiment result is shown in Table 2. The experiments were conducted in a regular laboratory environment, with typical ambient noise from human speech, computer equipment, and air conditioning systems. Average sonographs of three commands with different media. Translucence media can filter high-frequency signal. Success rate of light commands attack in some scenarios.
Impact of sunscreen film
Light intensity is the key to successful light commands attacks. Therefore, the transparency of the medium plays an important role.
The success rate and sonograph are shown in Table 2 and Figure 8. Sonographs are drawn based on the average recordings which are derived as follows From the result, we can tell that the success rate and average sound intensity increase with the transmittance of media. That is because the dimmer the light hits diaphragm, the weaker signal generated by the microphone and the harder for ASR system to recognize it. Specifically, we argue the success rate is content-related. With dimmer lights, the commands with more clear syllables can keep the success rate above 90% (i.e., “How is the weather in London”), while other commands such as “Unlock the front door” is frequently misrecognized as a homophonic command “I am not the front door.”

Setup for our lightweight solution. Channel one and channel two of the DC power supply in working under independent mode and they offer the input offset and VCC of the microphone breakout board. Voltage–current converter is powered by the 5 V output of the power supply.
Impact of incidence angle
Another factor that has influences on light intensity reaches the diaphragm is the angle of incidence. Due to the extremely short wavelength of light, it is difficult to diffract or scatter. The changing of incidence angle can affect the shape and diameter of the laser spot on the microphone. Therefore, every time we adjust the incident angle, we must adjust the lens of the laser to ensure that the beam is completely illuminated on the acoustic port of microphone.
The sonograph of this experiment is shown in Figure 9. We can tell from the sonographs that the intensity of the recorded voice is significantly decreased when the incidence angle changes from 15° to 20° compared with the change from 10° to 15°, which causes loss of a large part of voice information. When only direct light is considered, the intensity of light shine on diaphragm decreases with the increase of incidence angles. Within 15°, the success rate of the attack is 100%. But at 20°, the success rate of the attack immediately dropped to 0. Only a few words are recognized correctly (i.e., weather, camera, etc.). We have confirmed the impact of the incidence angle is not content-related.

Average sonographs of three commands with different incidence angle.
Hardware-based protection
In this section, we proposed two different hardware-based defense strategies from two perspectives. The basic idea is to utilize the different nature of sound and light, which is the wavelength and the media they travel in. Systematically verifying and optimizing these ideas requires a lot more works, thus we leave these to future work.
Improvement of microphone structure
Attacks on light commands can start with improving the structure of the microphone. Thanks to light commands, they have very strict requirements on the angle of attack; we can block the transmission of light by setting opaque barriers in the path while leaving gaps for the sound wave to pass through.
23
As the design in Figure 10, the barrier can be rather small as long as it can block the light directly to the diaphragm. Design of micro-electro-mechanical system microphone with light blocking barriers.
11

Position and design of microphone port
Smart cars have windows on every side, except the floor. Therefore, to prevent microphones from being attacked, the microphone port can be designed to face downward. This will not affect the quality of the recording at all, but it will prevent the attacker from injecting laser light into the microphone from the outside. Figure 11 shows another method which is to lengthen the tunnel between the acoustic port on the device’s body and internal microphone. We can estimate the angle range φ that can carry out a successful attack by Acoustic port with long tunnel can decrease the feasibility of light commands attack. Laser beam cannot reach the microphone while sound wave can.


These solutions should be considered, designed, and implemented during the design and manufacturing process. Once the product is produced, it is difficult to modify it.
Conclusion and future work
In this article, we redesign and implement light commands attack against VCSs on smart cars. We consider the light commands attack as an effective method of injecting malicious commands to microphone in sound isolated systems such as smart cars. And, we also propose a lightweight and cheaper setup using a common voltage-to-current converter to power the laser diode. Then, we proved sunscreen film on the market can provide limited protection against light commands attack. We also demonstrated that light commands attack is extremely angle sensitive which means adding opaque barriers can make it difficult to carry out attacks. To defend the light commands attack, we propose several hardware-based defense strategies to protect the in-car VCS and highlight that reshaping the tunnel from the device’s surface to the internal microphone is the most effective way of defense without hurting the usability of VCS.
Light commands attack exposed a fatal flaw of the MEMS microphone and found a new photoacoustic phenomenon. The principles behind light commands attack are still not clear. And several guesses are proposed such as photoelectric effect, photothermal effect, and light pressure. Better understanding of the physics behind it can not only provide new ideas for transduction attacks but also give us a new perspective of signal transmission. We leave this to future work.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by National Key R&D Program of China (2018YFB0904900, 2018YFB0904904).
