Abstract
Despite being an effective mode of communication, Computer-Mediated Communication (CMC), particularly regarding text-based exchanges, still has inherent limitations compared to in-person communication. The objective of the current study is to review the previous studies which identified and addressed the challenges, including difficulty in embedding emotional information in text effectively. Previous studies demonstrated changes in text properties might deliver emotional information as well as the two phases of text encodings could be completed simultaneously by interaction designs. The study also suggests future studies to find the emotional meaning of text features and to develop novel applications using machine learning technologies.
Introduction
Communication, both spoken and written communication, has two major cue structures, which are verbal and nonverbal. Academics posit that nonverbal cues such as eye contact, facial expression, gestures, and voice tones serve the purpose of transmitting affective information, whereas verbal cues are intended to convey cognitive content (Knapp et al., 2013). Spoken and written communication has distinct purposes and characteristics regarding human interaction. Spoken communication is face-to-face or remote to convey information and thoughts. On the other hand, written communication refers to using written language to exchange information and ideas, typically through electronic means such as email, messaging, and social media platforms, using text.
As technology has revolutionized over the last decades, Computer-Mediated Communication (CMC) has brought forth a new set of language forms, and people have progressively accepted this digitalized interpersonal communication as a norm (Walther, 2011). Despite its wide usage in our society, CMC poses a major challenge, including the difficulty in encoding context and emotion simultaneously while interacting online due to the lack of nonverbal cues. For example, text-based emails, messages, and documents may not be effective in transferring emotions along with the text, unlike in-person communication. The absence of nonverbal cues in CMC settings poses a significant challenge for recipients to interpret the intended meaning accurately, as nonverbal expressions are crucial communication components for both sender and receiver (Walther, 1992). The recipient or other individuals in CMC will ineluctably interpret the message, either consciously or unconsciously, based on their own perceptions and understandings (Hall et al., 2019).
Therefore, despite the similarities in their function to deliver information, several key differences exist between spoken and text communication. One of the biggest challenges is that the tone and emotion are often more difficult to convey in text communication. In this regard, some research efforts have proved the feasibility of conveying human emotions to typographical features to enhance the quality of text-based interaction in online settings (Choi & Aizawa, 2019; Ho, 2013; Tsonos & Kouroupetroglou, 2011).
The objective of the present study is to review the existing research that highlights the potential of reducing the emotional gap between sender and receiver in text-based communication by demonstrating the feasibility of conveying human emotions through text features. The study also suggests research opportunities to reveal and utilize the relationships between human emotions and text features.
Emotion and Typography
The challenge of accurately conveying emotions and intentions through written communication in a digital environment is widely recognized, with a notable deficiency in emotional expression. In order to overcome the challenges, recent studies have demonstrated the potential of typography to facilitate effective emotional communication, while changes in typography elements have been historically used for many documents to convey speakers’ emotions and context, rather than plain text, including font style (bold, italics, bold-italics) and font (type, size, color, and background color). Therefore, the typography of text was considered to provide implicit information beyond external properties. Bork (1982) attempted to develop a taxonomy of various types of text features, including lines, spacing, and characters.
Regarding the implicit meaning of typography, Mackiewicz & Moeller (2004) demonstrated that every typeface has a different personality or the ability to convey different feelings and moods. They conducted an experiment analyzing participant ratings of typefaces for different personality attributes and investigated why participants rated typefaces high or low on particular personality attributes. Choi & Aizawa (2019) conducted surveys to show how changing the typeface of a message impacts its meaning. For the preliminary study, they asked participants to use polar different typefaces while conversing through messaging to check its feasibility. Also, the author conducted role-playing and focus group interviews for the qualitative user study. The results indicate that using different typefaces in a message can influence user emotions; using multiple typefaces generated a lively mood and active response during texting. Another research presented the possibility of emotive typography by Rashid et al. (2006). They proposed framework examples to express five different emotions, which are Fear, Anger, Sad, Happy, and Disgust using animated text captions. Although these frameworks were empirically tested, the authors found that creating animated captions containing the designer’s perception of emotions is feasible.
Along with the improvement of technology for graphical visual representation, more advanced and dynamic manipulations of typography have become possible in terms of graphic design. Ho (2013) examined the connection between emotional responses and typography and proposed novel criteria for categorizing typography in three different ways that consider typographic design’s emotional effects and concluded by recommending the possibility of extending the design/emotion field beyond the product design to the graphical area, such as typographic design, kinetic typography, and other forms of them. Recently, Lim (2022) demonstrated “Interactive Typography,” a language that matches the effects of message delivery through visuals of video language and the effects of information delivery unique to fonts. The author demonstrated that interactive typography depicts variations in human emotions by altering numerous components such as shape, color, and speed through various situations and artwork production. The author also suggested “Interactive Typography” can be used to express emotions, saying, “Previously, the focus was on aesthetic concepts that emphasized beauty rather than letters for reading, but modern typography focuses on delivering accurate information and attracting viewers’ attention.” The author suggested that more studies should be performed to categorize and assess typographic change elements according to defined criteria.
Consequently, it was revealed that different features of text typography could deliver implicit emotion and context along with explicit meaning in the text. However, there has been a lack of understanding of the relationship between specific typography and associated emotion for enabling effective communication and facilitating optimal use of typography in more scientific ways.
Information Encoding In Written Cmc
As for the importance of including nonverbal cues such as emotion and context in communication, there have been attempts to incorporate these nonverbal expressions using various means effectively to speech (Pierre-Yves, 2003; Iida et al., 2000). Like speech communication, there have been efforts to combine nonverbal features for delivering context with text in written communication, including the changes in typography, as mentioned in the previous section.
However, while face-to-face speech communication allows speakers to present both nonverbal and verbal information simultaneously (e.g., increasing voice tone and speed, along with gesture), it usually requires separate steps for presenting verbal and non-verbal information in written communication, including CMC. Tang et al. (2008) classified two distinct types of media encoding as primary and secondary to highlight the encoding process of text-based communication. Primary encoding refers to the basic presentation of information through mediums such as speech and text. On the other hand, secondary encoding accompanies primary encoding and provides additional information, frequently through presentation style, such as emphasizing speech with gestures or highlighting critical text in a different color. After completing the primary information encoding, text editing typically requires users to perform secondary encoding tasks, such as modifying text properties or adding contextual nuances. This nature of separated encoding procedures in written communication in CMC may yield inefficient interaction compared to speech communication. That is, the manipulation of text features in CMC requires users to complete additional behaviors (secondary encoding) such as text selecting, menu pointing, menu searching, and menu selecting using a pointing device (e.g., mouse or trackpad) after typing text using a keyboard (primary encoding).
In order to overcome the limitation of separated written information encoding, studies have explored the possibility of simultaneously encoding text properties and contextual cues. Kim & Kaber (2009) proposed new text encoding methods using foot pedals. The prototype allows users to type text using a regular keyboard while using foot pedals to manipulate text properties like text size and font face (see Figure 1). The experimental results demonstrated that the use of two pedals with the 1st-order control could be useful while not degrading users’ workload and be comparable to conventional mouse use in terms of performance. They also suggested the use of foot pedals could be used for specific user groups with physical limitations in manipulating object properties in interacting with systems. Another study (Kim & Liu, 2023) demonstrated a novel interaction method to manipulate text properties (size) while typing text on mobile devices. A prototype of text messenger was developed and evaluated, which allows users to manipulate text size using tilting gestures while typing text instead of extra work to set the text size after typing.

Dynamic Text Edit using Foot Pedals on Desktop Computer (Kim and Kaber, 2009).
In sum, written communication in CMS requires two information encoding steps, including text contents information and manipulation of context information. Since these two separate steps require additional tasks for users, research efforts have tried to show the validity of real-time text encoding through one-step interaction.
Future Study Suggestions
Based on the issues in written text in CMC, including limitations in delivering non-verbal information along with the text as well as encoding them in an effective manner, the present study suggests some research issues. Two general research can be introduced to answer 1) what kind of text properties (typography) are associated with specific emotions; and 2) how to prepare and present the text integrating its property changes effectively in online written communication.
Relationships between Text Features and Emotion
As mentioned previously, there have been research efforts to investigate the association between typographical features and emotion from a design perspective because changes in text features can transfer non-verbal communication information such as emotion or context. However, a common limitation among these studies is that they rely solely on subjective methodologies and analysis approaches, lacking empirical evidence or scientific verification. For example, Tsonos & Kouroupetroglou (2011) study used the Self-Assessment Manikin test to extract a reader’s emotional state from text typography. That is, text features such as font, size, intensity, and italics can be manipulated, but the underlying scientific basis for how these morphological features relate to specific emotions has yet to be established.
The relation between emotion and text features could be found and established through scientific product design methods (Desmet & Hekkert, 2002; Suri, 2003). One of the most applicable approaches might include Kansei Engineering (Nagamachi, 1995), which is based on the "measure of meaning" method (Osgood et al., 1957). Kansei Engineering, sometimes referred to as emotional or affective engineering, is a term that was created by Mitsuo Nagamachi in the 1970s. It is defined as “translating technology of a consumer’s feelings and image for a product into design elements” (Nagamachi, 1995). Kansei Engineering focuses on the consumer’s psychological feelings and needs as a response to interactions with certain products or designs. It was originally used to design new products based on customers’ feelings and demands. In order to capture customers’ feelings, the semantic differential (SD) method (Osgood et al., 1957) was primarily used to measure and decompose the psychological meaning of the product. Even though Kansei Engineering has been developed and used in Japan, typically for designing products, a similar approach using SD has been used in many human factors studies.
Some examples that use Kansei Engineering to measure feelings are demonstrated in diverse areas. Chun et al. (2022) examined how the exterior design feature of Urban Air Mobility (UAM) relates to human perception and preference by using the method of Kansei Engineering. To comprehend users’ common perception regarding the UAM exterior design, factor analysis was performed on 30 semantic pairings to identify latent descriptive terms to define its design preference. Then they found the relation of design features with emotional descriptor terms throughout experiments and statistical modeling. Ren et al. (2019) used the method in designing the automotive dashboard. They proposed sensual image adjectives that describe dashboards to understand the relationship between user perception and dashboard design. The results indicated that there were preferred perceptual factors which are Gentle and Comfortable, which can be used as a helpful indicator of newly developing user-friendly designs. There is also an example of the development of a baby bag design that applies Kansei Engineering by Soewardi & Nasution (2016). To satisfy the needs of consumers, researchers combined Kansei Engineering and Fuzzy linguistic concepts while creating the new baby bags. They collected the desires of consumers to gain Kansei words and evaluated them with five scales of Semantic Differential. After, they used these data to model the prototypes of bag design. The researchers obtained a result that meets the customers’ requirements which are considered superior to their existing design. Another similar example is the research on the development and validation of multidimensional measures of display clutter (Kaber et al., 2008, Kim et al., 2011). In that study, the investigators developed a metric for assessing subjective clutter in a flight cockpit display. They also used the SD method to identify six latent factors in the perception of clutter to develop a measure. They then demonstrated the validity of the measure through experiments and found relationships between the subdimensions of perception with pilots’ performance and physical characteristics of the displays.
Across the examples presented above, the major activities of the studies involve 1) capturing the customers’ common feelings about the product in terms of psychological estimation; 2) identification of design characteristics of the product; and 3) development of relationships between design characteristics and human feelings through statistical approach including multivariate statistics such as factor analysis, multiple linear regression, quantification theories (Florio & Jones, 2021).
Many such previous studies have successfully demonstrated the relationship between psychological factors such as perception and emotion and product design features. In this regard, it is expected that the research method can be used for the systematic evaluation of typographical features and their impact on human perception. Like the typical Kansei Engineering method, the study is expected to consist of sub-research, including: Identification of common descriptor terms for human emotional perception in written communication, by collecting possible semantic pairs and eliciting underlying common mental structure of those meanings; Classification of typography features that may affect emotion, such as style (bold, italics, bold-italics), font (type, size, color, and background color), and character scaling and line spacing; Data collection of ratings of typography features on descriptor terms; and Data analysis for developing statistical models of the relation between typography features and human emotion. Consequently, the models are expected to use for designing written text communication to effectively deliver changes in emotions using the changing text properties.
Text Property Manipulations Incorporating AI
While manipulating text properties after preparing text may increase additional work efforts in CMC, some studies showed an interaction method to allow users to complete the 1st and 2nd text encoding at the same time. However, the use of machine learning (ML) or artificial intelligence (AI) can pose substantial opportunities in preparing effective written communication in CMC.
The recent rapid development of ML and AL technology has greatly impacted CMC in terms of human-computer interaction. One of the emerging applications may include automatic transcription, which converts human speech into text. It is currently being widely used in many applications, including automatic captioning in audio or video. Figure 2 shows a screenshot of a video clip (a lecture video) with subtitles generated by automatic captioning. This means the primary text encoding can be completed by a system incorporating ML without typing text using a keyboard.

Automatic Captioning in a Video Clip.
However, the automatic transcription being currently used generates plain text without text property changes which may deliver emotional information. It still requires additional and manual secondary text encoding tasks to manipulate the text properties after obtaining the primary text. In order to complete secondary text encoding in an automatic and simultaneous way, another ML function can be used, which is Speech Emotion Recognition (SER) (Ingale & Chaudhari, 2012; Khalil et al., 2019). Therefore, while a speaker is articulating, the ML-based automatic transcription can recognize the verbal information for converting to a primary text encoding, and the SER can recognize the speaker’s non-verbal information as emotion for converting them to manipulate text properties to deliver the speaker’s emotion, in real-time, based on the relations between emotion and text properties, to be identified.
The combination of automatic transcription and speech emotion recognition system is expected to: facilitate the accurate transmission of the speaker’s emotions in digital communication settings; enhance accessibility for individuals with hearing impairments by providing a more immersive digital environment that enables intuitive sensing of the speaker’s emotions, for instance, through more vivid audiovisual content; and extend the application of emotion detection technology to the realm of auto-captioning, allowing for the automated identification of media sounds and extraction of captioning that encapsulates the embedded emotions.
Conclusions
Effective communication involves conveying emotions through tone, volume, gestures, and the literal meaning of spoken words. However, expressing emotions through text is challenging as it lacks the real-time cues that accompany verbal communication. Although attempts have been made to encode emotions in text, it often includes a two-step process, leading to inefficiencies. To address this issue, researchers have explored ways to merge the two steps of real-time text encoding to create interactive systems. Despite these advances, linking real-time text encoding techniques to voice, tone, gesture, and expression remains challenging.
The current study reviewed those issues revealed in previous studies as well as research efforts to overcome the challenges of designing more effective written communication in CMC. In the extension of previous studies, two future studies were suggested, including the identification of the relationship between human emotion and text properties in written communication and AI incorporating automatic transcription with speech emotion recognition. These studies are expected to contribute to the design of more effective text-based CMC, along with consideration for users with hearing impairments.
