Abstract
Input Method Editor (IME) is an indispensable component on current smartphones. With its assistance, the number of key presses is reduced, and non-Latin characters could be inputted. Furthermore, modern IMEs integrate several personalized features like reordering suggestion lists and predicting the next words based on user’s input history. Such optimization improves the user experience but turns the IME dictionary into a pool of user privacy. Previous works have discussed the privacy risks coming from malicious IMEs. Indeed, they could cause security and privacy issues if installed by common users, but their impact is limited as the majority of IMEs are well-behaved. However, whether legitimate IMEs are bullet-proof is not answered before.
In this paper, we make the first attempt to study the security implications of IME personalization and the back-end infrastructure on Android devices. In the end, we identify a critical vulnerability lying under the Android KeyEvent processing framework, which can be exploited to launch cross-app KeyEvent injection (CAKI) attack and bypass the app-isolation mechanism. By abusing such design flaw, an adversary can harvest entries from the personalized user dictionary of IME through an ostensibly innocuous app only asking for common permissions. Our evaluation over a broad spectrum of Android OSes, devices, and IMEs suggests such issue should be fixed immediately. All Android versions we examined (from very old 2.3.4 to the latest 6.0.1) and most IME apps we surveyed (11 out of 18) are vulnerable. User’s private information, like contact names, location, etc., can be easily exfiltrated. Up to hundreds of millions of mobile users are under this threat. To mitigate this security issue, we propose a practical defense mechanism which augments the existing KeyEvent processing framework without forcing any change to IME apps.
Introduction
The smartphone is becoming the primary device for handling people’s daily tasks like making calls, sending/receiving messages, and surfing the Internet. Of particular importance in supporting these features are input devices. Among them, keyboard, either hardware keyboard integrated within mobile phone or soft keyboard displayed on the touchscreen, receives a significant volume of users’ input. Most of these keyboards are tailored for the users speaking Latin languages. Users in other regions like Chinese and Japanese have to use Input Method Editor (or IME) to type non-Latin characters. In fact, numerous IME apps3
We use IME and IME app interchangeably in this paper.
The wide adoption of IME, however, does not come without cost. Previous research has raised the privacy concerns with

Smart IME on Android.

Warning message when activating a new IME.
For this particular case, because of the discrepancy between the point of checking and the point of delivering, IME is turned into the final receiver on the fly when KeyEvent is delivered. Therefore, the security check at the beginning is bypassed. Since this issue exists in the system layer, all IMEs are potentially under threat. The security implication of the above attack on IME apps may not be so obvious, but it actually can lead to severe privacy leakage. The nature of this attack is to simulate user keystrokes on an IME and obtain the personalized suggested words. For example, when a user strokes “
We implemented a proof-of-concept malicious app named
Our preliminary version of this work has revealed the CAKI vulnerability and the corresponding exploiting approach [15]. In this extended version, we carry out a more systematic study to explore such system-level design flaw and the privacy leakage threats coming from personalized features. Specifically, we propose a quantification method to measure the privacy leakage so that the potential security risks can be understood and noticed better. Also, we analyzed the back-end supporting framework for third-party IMEs to complete the in-depth investigation. The methods of hiding attacks are improved, and additional evaluation results (extended experiment involved nearly all mainstream IMEs for English and Chinese) are provided according to the current Android system evolution.
We reported our findings to Google’s Android Security Team, and they acknowledged the problem we have raised (bug tracking ID assigned:
New Vulnerability. We discovered a fundamental vulnerability in the Android KeyEvent processing framework leading to CAKI attack.
New Attack Surface. We show, by launching CAKI attack, an attacker can steal a variety of private information from IME dictionary. Different with previous IME-based attacks, our attack is the first to exploit the innocent IMEs.
Systematic Study. We carry out a systematic study on the relationship between IME user dictionary and privacy leakage. Several popular IMEs are analyzed to measure their optimization features and data sources. Also, such privacy leakage risks are quantified based on the information theory.
Implementation, Evaluation, and Defense. We implemented the attack scheme and developed a proof-of-concept attacking app,
In this section, we provide the background of the personalized user dictionary and Android back-end framework supporting third-party IMEs. After that, the adversary model used in this paper is given.
Personalized user dictionary and privacy
IMEs have emerged to support users speaking different languages like English and Chinese. A smartphone is usually shipped with pre-installed IMEs, but alternatively, users could download and use other IME apps. Smart IMEs have gained massive popularity: top IMEs like Sogou Mobile IME [39] has more than 478 million active users [44].
The IMEs used today have been evolved from solely soft keyboard to versatile input assistant with many new features to improve users’ typing experience. The goals of these new features are to reduce the number of keys a user needs to type. For instance, modern mainstream IMEs (such as SwiftKey [42], TouchPal [45], Sogou Mobile IME, etc.) implement features like dynamic suggestions order adjustment, contact names suggestions, next-word prediction and new word saving to provide suggestions for current or subsequent words. Hence, a user could select a word among them without typing the complete text. These features are called “optimization features” and we elaborate them below:

Dynamic order adjustment.

Contact names suggestion.

Next-word prediction.

New-word saving.
To summarize, all the above features are driven by user’s personalized information, like user’s input history. Furthermore, when the permissions shielding user’s sensitive data are granted, IMEs can customize their dictionaries using various data sources, including SMS, Emails, and even social network data. It is very likely that the names of user’s family members and friends and nearby locations are recorded by the IME after using for a while. We manually examined the settings and permissions of several IMEs and summarized the data sources of 18 representative mainstream IMEs (most of them have over 1 million installations, all available on Google Play) in Table 1. Apparently, the personalized dictionary should be considered private assets and protected in the safe vault. In fact, most of the IME apps we surveyed keep their dictionaries in the internal storage of mobile phone which is only accessible to the owner app.
Data sources of mainstream IMEs for optimization features
IMEs are usually pre-installed, but they may not satisfy users speaking different languages. Therefore, Android provides an extensible Input Method Framework (IMF) that allows users using alternative third-party IMEs. There are three primary parties involved in IMF [18]:
The
An IME implements a particular interaction model allowing the user to generate text. The system binds to the current activated IME, causing it to be created and run, and tells it when to hide and show its UI.
Multiple client applications arbitrate with the
Figure 7 explains the simplified workflow of Android IMF. There are three main steps:

Android input method framework.
When a view (
This view requests the activated IME show its UI (soft keyboard).
The user starts to strike keys on the keyboard. These keystrokes are captured by the IME, and composed texts are committed to the bonded view (client application).
Supported by this framework, a user can download and install other IME apps. After installing the desired IMEs, she can select which one to use from the system settings and use it across the entire system; only one IME may be enabled at a time [13].
The adversary we envision here is interested in the dictionary entries of IME deemed private to the user, like contact names, and aims to steal and exfiltrate them to her side. We assume the user has installed a victim IME which is “
“Benign” means this IME exercises due diligence in protecting user’s private data. The measures taken include keeping its dictionary in app’s private folder (internal storage).4
Android designs a global provider
“Smart” means this IME can learn unique word-using habits and build a personalized user dictionary based on user’s input history, contacts, and so forth.
At the same time, we assume this user has downloaded and installed a malicious app named
We identify a new vulnerability lying under Android OS, allowing an adversary to launch
Android KeyEvent processing flow
The internal mechanism of input processing in Android is quite sophisticated, and here we only focus on how KeyEvents5
IME accepts another kind of input event – MotionEvent [28], coming from the soft keyboard (see Fig. 1). Its processing flow is different and not covered in this paper.
Our descriptions are based on Android 6.0.1_r67 [3] since these versions hold the largest market share among all the Android systems. For other versions, the flows are mostly the same, and only the paths of source code could be different.

Android KeyEvent processing framework and the CAKI vulnerability.

Code block for permission checking in InputDispatcher
This routine first verifies whether the event is generated by a hardware device (checking
An input event passing the above check will be dispatched via a system IPC mechanism
Since the simulated key-presses could come from a malicious app, Android enforces much stricter checking. Still, the checking routine is not flawless. Below, we elaborate a critical vulnerability in this routine (illustrated in Fig. 8):
The KeyEvent is originated from the hardware keyboard.
The KeyEvent injector and receiver are the same.
The KeyEvent injector has been granted the
Since
A non-system-level malicious app (named
This CAKI vulnerability can be attributed to a significant class of software bugs, namely
We found only one vulnerability disclosure by Palo Alto Networks’ researchers [53] regarding TOCTTOU on Android, which was reported in March 2015.
In this section, we describe the design and implementation of the proof-of-concept attacking app
After

CAKI attack flow.
Given the enormous size of IME dictionary (hundreds of thousands of words), the most prominent challenge is how to identify the entries comprehending user’s private information efficiently. These entries could be added from user’s typed words, imported from user’s private data (e.g., contact names) or reordered according to user’s type-in history. We refer to such entries as private entries here. Through manually testing several popular IME apps, we observed one important insight regarding these private entries: they usually show up after 2 or 3 letters/words typed and placed in the first or second position in the suggestion list. In other words, by enumerating a small number of letter/word combinations, a large number of private entries can be obtained. We design two attack modes based on such insight:
The generated list comprehends both private entries and the entries irrelevant to customization. We need to filter out the latter ones. To this end, we carry out a differential analysis. We run
Attack in stealthy mode
When
Yet, common apps cannot be brought to the top of foreground when the phone is locked. On Android, apps’ windows are managed by the system service

Windows layer relationship.
Not only is our attack effective against IMEs for English, but IMEs for non-Latin languages are also vulnerable as well. Apart from English users, the users who type in non-Latin words have to rely on alternative IMEs since the language characters are not directly mapped to English keys. In this section, we demonstrate a case study on attacking Chinese IMEs. It turns out just a few adjustments need to be applied to the enumeration algorithm, and private entries can be obtained efficiently, albeit the complexity of such language.
The initial set:
Each Chinese character has a unique syllable, but one syllable is associated with many distinct characters. Each Chinese word is composed of multiple characters (usually two to three). An example is shown in Fig. 12. The character combination poses a big challenge in harvesting meaningful Chinese entries: a prefix (e.g., “

Example of Chinese Pinyin.

Pinyin: One-to-many mapping.

Enumerating 2-character words of Pinyin-based IMEs
We implemented a prototype of the attack app
Scope of attack
Our newly discovered CAKI vulnerability derives from the design flaw of Android framework. Thus, theoretically, all Android devices will suffer from this vulnerability. To corroborate this assumption, we test 8 different versions of Android OS on 6 physical Android phones. The test turns out all versions ranging from very old (2.3.4) to the latest (6.0.1) are vulnerable without exception. The vulnerable phones and the corresponding OS versions are listed in Table 2.
Evaluation on different Android OSes
Evaluation on different Android OSes
Also, our attack is not limited to a specific language or a specific IME. All smart IMEs equipped with optimization features should be potentially vulnerable. We tested our attack on 18 popular IMEs and 11 among them are vulnerable, as shown in Table 3. Our attack does not succeed on 7 IMEs, and we further explored the reasons of unsuccess. After manual code analysis and event instrumentation testing, we confirmed two main causes: (1) KeyEvents cannot trigger the word suggestion feature (on Fleksy and IQQI); (2) The suggestion committing only could be completed through screen operations (on Swype, Google, Kika, Emoji, and Ginger). We could conclude these IMEs only respond to taps on soft keyboard (MotionEvent), but ignore the key-presses simulated by apps (KeyEvent). However, they may have compatibility issues since hardware keyboard is not supported well. Therefore, we believe such lucky escape is probably due to design flaw rather than protection enforced.
Evaluation on popular IMEs
Remarks: The data of installations is based on Google Play.
A successful attack also means it should not be found by anti-virus software. We tested five leading mobile anti-virus apps, including AVG, McAfee, Avira, Dr. Web, and ESET. After executing threat scanning, none of them reported
Experiment on word completion attack mode
In this mode,
The experiments have followed the IRB rules, and all human subjects fully understood the privacy implication of the experiments and agreed to participate.
Word Completion Attack: user study
Remarks: The native language of
All the participants installed a modified version of
Apparently, plenty of sensitive information will be leaked if the CAKI vulnerability is exploited by real attackers. On average, 58.2% of the words extracted are indeed personalized. The ratio of personalized entries is associated with the frequency of IME usage, either in principle or practical experiment results. Figure 13 illustrates such rough positive correlation relationship. The only exception is
From the statistics of Table 4 (plotted in Fig. 14), the most obvious information leakage source is the contact name which is definitely sensitive to users. Besides, based on the feedback of volunteers, the category of “other personalized information” includes the names of installed apps, watched films, friends’ nicknames, frequently used slang expressions, the names of nearby restaurants, and so forth. It reflects the kinds of leaked information are various from another point of view.

Word Completion Attack: personalized entries vs. duration of IME usage.

Word Completion Attack: category of information leakage.
The actual time overhead depends on both the implementation of IME apps and the performance of phone’s hardware. We measure it on Samsung Galaxy S3 and set the waiting period to 70 ms based on manual testing a priori. We further test the time consumption for all 2-initial combinations and 3-initial combinations. These two kinds of combinations have covered most situations of common Chinese words [26]. The total time consumed adds up to 221 s for all 2-initial combinations injections against Sogou Mobile IME. Meanwhile, the battery consumption is also slim, costing less than 1% of total battery life. Also, all 3-initial combinations injections can be done within two and half hours. Considering the attack can be done during sleep time, this is still a reasonable range.
In this mode,
In a real-world scenario, an IME is customized by the text a user inputs or information left by the user. Likewise, for each virtual user, we compile the text she could enter and dump it to IME. In all, we create five users (labeled as
Next-word Prediction Attack: simulation experiment
Next-word Prediction Attack: simulation experiment
Since TouchPal can read messages and customize itself, we dump the collected blog content into the SMS outbox of the test phone (Samsung Galaxy S3) using an Android app developed by ourselves. We use one paragraph to fill one text message.
Now, TouchPal can proceed to customize its dictionary. We tick the options “
When a predicted word is selected, TouchPal will prompt a new predicted word. Hence, a user can type one word and continuously choose the words provided by TouchPal to build a long phrase. We leverage this feature to carry out 3-level prediction attack. For example,
For a sequence of injected keys, we compare the phrase generated from fresh IME (left-side of “⟶”) and
Next-word Prediction Attack: examples of private entries –
For
Next-word Prediction Attack: examples of private entries –
In this section, we propose some possible defense solutions and discuss how to fix the vulnerability described in this paper (Section 6.1). Also, we discuss the possibility of attack utilizing MotionEvent injection (Section 6.2). In addition, we try to measure the severity of our proposed attack through a quantitative method (as supplementary materials for interested readers, see Appendix).
Defense
Our proposed attack,
It is indeed a non-trivial task to fix this vulnerability due to the highly sophisticated design of Android. First of all, it is useless to add a new permission to circumscribe such attacking behaviors. For example, injecting KeyEvent to the app itself should be permitted as usual for the purpose of automated testing unless IME is involved in the process. Yet, there is no way to ensure this when the app is installed. Second, it is also infeasible to modify IME app code merely and reject all the injected KeyEvents since the injections from system-level apps owning the
To mitigate such threat, we propose to augment the current KeyEvent processing framework. Currently, the information about KeyEvent sender is limited. It only tells whether the KeyEvent is injected by one app or coming from the hardware keyboard, turning out to be too coarse-grained. We argue that the identity of the source app (i.e., package name, signature) should be enclosed in KeyEvent as well, which can be fulfilled by adding a new field to its data structure. Before dispatching a KeyEvent, Android OS automatically attaches the sender’s identity to it. Before forwarding KeyEvents to IME, Android OS verifies the sender and discard the injected KeyEvents if the sender is neither system app owning the

Android KeyEvent processing framework with defense logic.
Furthermore, we examine other possible countermeasures, but all of them come with the loss of usability or compatibility. One possible solution is to prohibit IME being invoked when the phone is securely locked, but this will disable the quick-reply feature of the default SMS app and third-party instant messaging (IM) apps. We can also force IME to commit words to text controls only if the word displayed on the touchscreen is tapped, but it will block the input from the hardware keyboard.
From the aspect of data cache, the source of privacy leakage is the IME personalized dictionary. As a developer, sensitive information should not be designed to store in such dictionary. However, how to define the “sensitive” will be a challenge. The balance between usability and security is one of the eternal conundrums in software development.
In this paper, our attack is based on the cross-app KeyEvent injection. One straightforward question is whether the cross-app MotionEvent (touch event) [28] injection is possible. Unfortunately, based on the exploration of the internal mechanism of Android touch event dispatching, we cannot see the possibility of exploiting MotionEvent. The fundamental reason is that IME apps do not have a higher priority to receive dispatched MotionEvents, which is different from the flow for KeyEvent. In details, when the system dispatches a MotionEvent, a similar security check (like the one for KeyEvent illustrated in Fig. 8) is deployed to prevent cross-app MotionEvent injection. Though a malicious app is allowed to inject MotionEvents to itself, IME apps will never become the final receiver because the location information (x-axis, y-axis) contained in the MotionEvent has specified the final receiver explicitly.
Related works
To the best of our knowledge, our work makes the first attempt to evaluate the security implications of IME personalization and the back-end infrastructure on Android devices. None of the previous work on IME has addressed security concerns from this perspective. We classify related work into three categories based on the previous discussions: (1) IME security issues (2) key-logging attacks (3) untrusted input.
IME security issues
According to Android’s official documents [13,18], the Input Method Framework (IMF) has several security features: (1) the system can directly access an IME’s interface via the
IMEs can collect all user-typed text, and user’s privacy will be breached if an IME sends out the collected key presses out of malice. There are some existing issues found in built-in IMEs. A recently published post about the vulnerability of Samsung built-in IME shows that a remote attacker capable of controlling a user’s network traffic can manipulate the keyboard update mechanism on Samsung phones and execute code as a privileged (system-level) user on the target’s phone [51]. It can be used in behavior that requires no user intervention. Even if the vendor has fixed this kind of issue, the security issue itself always catch our eyeballs.
Besides built-in IMEs, the Android IMF also allows using third-party IMEs, which always leads to potential risks. In fact, there have been questionable behaviors of IMEs observed in the wild [9,34,35]. Also, Cho et al. [11] demonstrated malicious IMEs on Android could effectively steal users’ sensitive keystrokes. Hence, Android users should be aware of this risk, and only install third-party IMEs from the official app store, and even then, make sure the source is legitimate. To prevent such threat of information leakage, Chen et al. [10] propose I-BOX, an app-transparent oblivious sandbox that minimizes sensitive input leakage by confining untrusted IME apps to predefined security policies.
In our work, we identify an entirely different venue to abuse IME: rather than enticing users to install a malicious IME, an adversary can exfiltrate the sensitive information through a new system design flaw and a novel IME dictionary probing technique.
Keystroke inference attacks
Key-logging on mobile devices is the action of recording (logging) the keystrokes on a virtual or physical keyboard, typically in an underlying way, so that users usually have no idea that their movements are being captured. Suenaga [41] and Mohsen et al. [27] also studied key-logging threats of malicious IMEs on the Windows and Android platforms respectively. In the Android system, a non-system app cannot obtain users’ keystrokes directly. However, previous works show that it is possible to infer keystrokes through various side-channels.
Phone motions can also be exploited to detect the keystrokes. More specifically, a touch on the screen, especially the soft keyboard will cause vibrations, and touching on different positions will introduce distinctive vibration patterns. Previous works monitor the motion sensor like accelerometers to collect vibration statistics and infer what keys are pressed [6,7,25,31,32,54]. For instance, TouchLogger [7] is an Android app which can extract features from device orientation data to infer keystrokes. This work claimed that it could correctly infer more than 70% of the keys typed on a number-only virtual keyboard. TapLogger [54] also can stealthily monitor the movement and gesture changes of a smartphone using its onboard motion sensors, so that it can infer the users’ inputs according to the learned patterns. TextLogger [32] utilizes the shared memory side-channel for detecting window events and tap events of a soft keyboard, which achieves long user inputs inference. More recently, such threats of key-logging have been extended to wearable devices. Liu et al. [24] and Wang et al. [48] have first demonstrated the possibility of inferring user’s key-presses through smartwatch.
Besides, other sources are also exploited for keystrokes inference, such as microphone [30], Wi-Fi information [23], camera [17,38], and light sensor [40]. For example, a combination of stereoscopic microphones and gyroscopes is used to infer users’ keystroke when they tap on a soft keyboard [30]. In the work of Li et al. [23], the adversary can exploit the strong correlation between the channel state information (CSI) fluctuation and the keystrokes to infer the user’s number input. Further, Simon and Anderson [38] demonstrate that video camera and microphone can be used to infer PINs entered on a number-only soft keyboard on a smartphone. In particular, the microphone can detect touch events, while the camera is used to estimate the smartphone’s orientation, and correlate it to the position of the digit tapped by the user. Also, Spreitzer [40] shows the light sensor employed in today’s mobile devices actually represents a new type of side channel that leaks the user’s input.
In comparison with the existing works, our work steals user’s input history in plain text without inference, and a significant amount of typed text can be unveiled in a short time.
Untrusted input
A plethora of key functionalities in mobile devices are driven by user’s input and the modules handling such data are usually entailed with very high privileges. A natural path for a malicious app to elevate its privileges is impersonating human and injecting false input.
Besides direct leakage due to IME, other sources or tools can also be used as an untrusted input to launch attacks. Diao et al. [14] discovered that an adversary could inject prerecorded voice commands to the built-in voice assistant module (Google Voice Search) of Android and bypass permission checks. According to the paper experiment, theoretically, all Android (4.1+) devices equipped with Google Services Framework can be affected by GVS-Attack. Jang et al. [21] investigated accessibility (a11y) support framework of popular desktop and mobile platforms and identified a number of system vulnerabilities in handling user’s input.
In our work, we identify a new channel to inject fake input and bypass the security checking. We believe such threat is not yet over and encourage future research in identifying other exploitable sources and building better input validation mechanisms.
Conclusion
In this paper, we identify a new cross-app KeyEvent injection vulnerability against IMEs installed on Android devices. By exploiting such flaw, an adversary can infer words frequently used by a user or coming from other sensitive sources. We implement
Footnotes
Acknowledgments
We thank anonymous reviewers for their insightful comments. We also thank Fenghao Xu for helpful discussion. This work was partially supported by National Natural Science Foundation of China (NSFC) under Grant No. 61572415, Hong Kong S.A.R. Research Grants Council (RGC) Early Career Scheme/General Research Fund No. 24207815 and 14217816, Guangzhou Key Laboratory of Data Security and Privacy Preserving, Guangdong Key Laboratory of Data Security and Privacy Preserving, and National Joint Engineering Research Center of Network Security Detection and Protection Technology.
Quantifying the information leakage
To better understand how severe our proposed attack could be, it is necessary to take a quantitative study on the information leaked through this attack. Without loss of generality, we build the analysis model based on the information theory [12] by calculating the entropy (i.e., uncertainty) changes of the data in the personalized user dictionary before and after attacking.
We assume that adversary can get the default non-personalized dictionary from a freshly installed victim IME. Let
To the default suggestion list
To the personalized suggestion list
As a result, the leaked information
