Abstract
In this work we experiment with the hypothesis that words subjects use can be used to predict their psychological attachment style (secure, fearful, dismissing, preoccupied) as defined by Bartholomew and Horowitz. In order to verify this hypothesis, we collected a series of autobiographic texts written by a set of 202 participants. Additionally, a psychological instrument (Frías questionnaire) was applied to these same participants to measure their attachment style. We identified characteristic patterns for each style of attachment by means of two approaches: (1) mapping words into a word space model composed of unigrams, bigrams and/or trigrams on which different classifiers were trained (Naïve Bayes (NB), Bernoulli NB, Multinomial NB, Multilayer Perceptrons); and (2) using a word-embedding based representation and a neural network architecture based on different units (LSTM, Gated Recurrent Units (GRU) and Bilateral GRUs). We obtained the best accuracy of 0.4079 for the first approach by using a Boolean Multinomial NB on unigrams, bigrams and trigrams altogether, and an accuracy of 0.4031 for the second approach using Bilateral GRUs.
Keywords
Introduction
Attachment theory was presented in 1969 by John Bowlby [1]. He states in it that humans generate patterns of their own affective ties and ways of interacting in their first years of life [2]; These patterns are generated with adaptive purposes based on experiences or life history [3].
Ainsworth and his team propose three styles of attachment: safe, anxious / ambivalent and avoidant [4].
If the mother is sensitive and receptive to the need of her child, he or she will respond with certainty to the separation and the return of the mother. [...] If the mother is indifferent when (s)he needs her and meddlesome when (s)he does not need her, the child will react with an attitude of anxiety / ambivalence by clinging to her sometimes and discharging her/his anger in others. [...] If (s)he had systematically rejected his/her attempts to establish physical contact, the child would adopt an attitude of avoidance [5].
Later, Bartholomew and Horowitz proposed one of the most accepted models of attachment today [6, 7], in which there are four styles of attachment that depend on two dimensions: the level of anxiety and intimacy, understood as the capacity to form close relationships (See Fig. 1).

Attachment styles as defined by Bartholomew and Horowitz.
Mainly there are two ways to determine what style of psychological attachment a person possesses: the use of a self-report or performance in an interview. The psychological interview is considered one of the most important methods for qualitative analysis in psychology; however, it is also one of the psychological methods that requires more training for applicators. This fact may represent a disadvantage in studies on attachment that require a large sample and do not have a sufficient number of interviewers. On the other hand, self-reporting instruments can be generated and applied in different ways; however, these tend to present the problem of social desirability (the need to be approved), which consists in a common bias of the truthfulness of the participants’ responses when they do hypothesis about what the researcher wants to be answered, hypothesis generated from the content of the questions or test items of the instrument.
We consider that an alternative to these two traditional methods to determine the style of attachment is the linguistic analysis of texts generated by people without prior knowledge that they will be analyzed, or in their absence, without indicators that allow them to generate hypotheses about what the researcher wants. The form that we consider most adequate to carry out this linguistic analysis is by extracting linguistic patterns characteristic of each style of psychological attachment, so that later we can calculate the amount in which these patterns are found in a given text and in this way, classify the individual within one of the attachment styles.
Computational Linguistics is presented as one of the most important alternatives to search and measure these patterns. Thanks to the information processing power of computer science, we can carry out more quantitative studies. This gives the advantage of gaining objectivity with respect to other linguistic methods of a more subjective nature or whose quantification is more delayed when having to be done by humans.
The method that will be used to address the problem consists on classifying each biography under the four existing types of psychological attachment (See Section 3). For this, we will experiment with two different groups of algorithms: one is based on traditional natural language processing techniques such as lemmatization, binarization, stop words removal, n-grams, and attribute selection to build a word space model on which different classifiers can be trained (Section 3.1). On the other hand, we propose using a word embeddings-based technique, using Bilateral Gated Recurrent Units. With this approach, a vectorial representation of short autobiographies written by undergraduate students is generated, and then this is used to classify the psychological attachment (Section 3.2).
Linguistic patterns identified could serve as a basis to apply an instrument or measurement technique of attachment. In fact, this hypothetical measurement technique has the advantage of not requiring the individual to be aware that he or she is being evaluated since this awareness involves the risk of a bias in their responses—as stated before, for example, through social desirability or desire to please the experimenter.
Greater reliability in the results of the forms of measuring attachment has repercussions in the broad sectors where social psychology can be applied; that is, sectors such as education, public policies, market strategies, job training, clinical, among many others.
This research aims to create bridges between different areas of knowledge. Among these we can find Social Psychology, Computational Linguistics and Artificial Intelligence. This is how interdisciplinarity is achieved, one of the objectives that recurrently arise in current academic trends.
Section 2 refers to current works related to identifying linguistic patterns for finding psychological traits. Then, in Section 3 we describe our proposed method. Experiments and results are detailed in Section 4, and finally we draw our conclusions and discuss future work in Section 5.
Psychology is closely linked to the recognition of patterns since its inception as a science with the founding of Wilhelm Wundt’s laboratory in 1879. In this laboratory, an attempt was made to find patterns of perceptions, feelings, ideas, etc., by using the method of introspection [8]. Currently, pattern recognition remains closely related to the various branches of Psychology. For example, Psychophysiology searches for patterns within the waves thrown by electroencephalograms and seeks to associate them with different states such as wakefulness, sleep and coma, as well as pathological states such as epilepsy [9]. On the other hand, in Experimental Psychology different patterns of behavior are sought, usually in animals, by manipulating variables such as the delivery time of a reinforcer, the type of reinforcer and the action necessary to obtain it [10]. In Cognitive Psychology, patterns have been found in terms of reaction times, perceptions, memory, decision making and even in the same recognition of patterns that humans make [11]. Finally, Social Psychology is interested in patterns of attitudes, self-concept, persuasion, and all psychological phenomena related to the interaction of two or more individuals [12]. Attachment is a concept of Social Psychology and what it proposes is that each individual has one of four group of characteristic patterns to relate to the other (attachment styles).
James Pennebaker has studied a large number of topics that relate to Psychology with Linguistics using Computational Linguistics tools in various works. In his publications we can find works concerning the differences between the number of words used by men and women, the type of vocabulary and the subject of conversations between students, or the psychological implications of the use of natural language [13–15]. It is important to highlight the LIWC (Linguistic Inquiry and Word Count) program, software developed by Pennebaker and collaborators [16] that analyzes a text by counting words and grouping them into categories previously defined by psychological dimensions such as emotions, self-references, causal words, etc.
Although there have been some approximations between Computational Linguistics and Psychology, we have not found in the literature any application of the first discipline to the specific subject of psychological attachment. This is why we consider important to explore this field with the help of natural language processing. In addition, research such as that of Song et al. [17] and Huynh et al. [18] reinforce the idea that behavior and intentions can be predicted from a linguistic analysis viewpoint, and therefore, that it is feasible to find linguistic patterns for each attachment style.
Description of the psychological instrument
According to attachment theory and research [7], there are two fundamental ways in which people differ from one another in the way they think about their close relationships. First, some people are more anxious than others. People who are high in attachment-related anxiety tend to worry about whether others really love them and often fear rejection. People low on this dimension are much less worried about such matters. Second, some people are more avoidant than others. People who are high in attachment-related avoidance are less comfortable depending on others and opening up to others. Different combinations of anxiety and avoidance result in four types of personality attachment, as shown in Fig. 1: People who are
Based on the theory of Bartholomew and Horowitz [3], Frías’ instrument [19] considers attachment as the result of the combination of two independent dimensions: avoidance and anxiety; therefore, the instrument results in a value for each of these dimensions. The instrument is composed of 36 items on a Likert scale that goes from 1 to 7 to indicate how well you agree with each of the corresponding reactive sentences, where 1 represents nothing and 7 totally. Each test item is an affirmation about the way the participant feels about their close emotional relationships (romantic partners, close friends or relatives).
Then these scores were converted to the representation of the Horowitz and Bartholomew attachment types (secure, fearful, dismissing and preoccupied), as shown in Fig. 1.
Proposed methods
In order to predict the attachment style from an autobiography, we experimented with several classifiers that operate on two main groups of features. The first group consists in Word Space Model features, i.e., a vector that indicates the presence (or absence) of a word or a group of words (n-grams) in a document. For example, see Table 1. Two documents are represented with binary features corresponding to their corresponding words. See Subsection 3.1.
WSM representation for D1:the girl plays the piano and D2:the boy plays a game
WSM representation for D1:the girl plays the piano and D2:the boy plays a game
The second group consists in word embedding methods, where each document is represented in a vector where each dimension represents an aspect or subject that is present in it. See Table 2. Here each column represents a subject or a topic that is present in a certain degree in a document. For example, we can think of Z1 as the amount of determiners in each document, Z2 as the relationship with musical instruments, Z3 the presence of human beings (boy, girl), Z4 the degree of maleness (boy vs. girl), etc. up to Z n that could represent ludic elements (game). Topics are inferred automatically by the context in which words are used, and words are automatically grouped. Usually n ranges from a few tens to several hundreds (usually not more than 300). Subsection 5 gives more details on the extraction of these features.
Word embeddings representation for D1 and D2
Figure 2 shows the proposed procedure used for Personality prediction from autobiographies using Word Space Model-based methods. First, the whole dataset is cleaned for correctly handling of abbreviations and punctuation using regular expressions. Then, with the text for each autobiography, a Word Space Model (WSM) is created. Several variations for this WSM are performed. N-gram extraction is optional, as well as lemmatization, and stop-words removal. This is represented in Fig. 2 with arrows showing that each one of these processes can be skipped. Finally, once the WSM is built, we experimented with reducing dimensions for improving classification.

Word Space Model based procedure.
The classifiers we experimented with were: Multi Layer Perceptron (MLP), Naïve Bayes (Gaussian), Bernoulli Naïve Bayes and Multinomial Naïve Bayes.
where p ki is the probability of class C k generating the term x i . This event model is especially popular for classifying short texts. It has the benefit of explicitly modeling the absence of terms. Note that a naive Bayes classifier with a Bernoulli event model is not the same as a multinomial NB classifier with frequency counts truncated to one.
N-grams
An n-gram is a contiguous sequence of n items from a given sample of text, in this work, these items are words, and we used only contiguous n-grams. Consider, for example the following sentence: The boy is playing with the red ball. The set of unigrams for this sentence is: {The, boy, is, playing, with, the, red, ball }. The set of bigrams is: {The_boy, boy_is, is_playing, playing_with, with_the, the_red, red_ball }, and finally the trigram set for this example sentence is: {The_boy_is, boy_is_playing, is_playing_with, playing_with_the, with_the_red, the_red_ball }.
Feature selection
Feature selection is the process of selecting a subset of relevant features in model construction. Feature selection techniques have been successfully applied in text classification tasks [24]. The central premise when using a feature selection technique is that the data contains many features that are either redundant or irrelevant, so that they can be removed without loss of information. Redundant or irrelevant features are two distinct notions, since one relevant feature may be redundant in the presence of another relevant feature with which it is strongly correlated [25].
Word embedding based methods
The Long Short-Term Memory (LSTM) unit was initially proposed by Hochreiter and Schmidhuber [27]. Unlike to the traditional recurrent unit which overwrites its content at each time-step, an LSTM unit is able to decide whether to keep the existing memory via the introduced gates. Intuitively, if the LSTM unit detects an important feature from an input sequence at early stage, it easily carries this information (the existence of the feature) over a long distance, hence, capturing potential long-distance dependencies.
Gated Recurrent Units (GRU) were proposed by Cho et al. [28] to make each recurrent unit to adaptively capture dependencies of different time scales. Similarly to the LSTM unit, the GRU has gating units that modulate the flow of information inside the unit, however, without having a separate memory cells.
This procedure of taking a linear sum between the existing state and the newly computed state is similar to the LSTM unit. The GRU, however, does not have any mechanism to control the degree to which its state is exposed, but exposes the whole state each time.
GRUs have proven to have better performance than LSTMs in several applications [29], and recently they have been applied to text classification [30]. Therefore, we propose using GRUs to predict attachment style from short biographies. Additionally, we propose using a Bilateral GRU-based architecture to tackle this problem. This design is shown in Fig. 3.

Proposed BGRU-based architecture
For the experiments described in this section we asked 202 participants to write by hand a short autobiography or experience of approximately one page. Once the writing of the text was completed, they were asked to answer the Frías instrument [19] for attachment measurement. Participants were university students of the UAM (Universidad Autónoma Metropolitana) campus Xochimilco and Iztapalapa. They had an average age of 22.13 years with a standard deviation of 3.5. The age range was from 17 to 41 years; This is due to the fact that one of the characteristics of the student population of the UAM is the relatively high percentage of students who work or who resume their studies after several years of work. 51% of the participating population were men and the remaining 49% were women. Details are summarized in Table 3.
Statistics on participants’ age
Statistics on participants’ age
An extract of the measurements using the Frías instrument is shown in Table 4. For each subject there are the scores of the measurement for anxiety and avoidance in its original scale (anxiety and avoidance in scale of 1 to 7), in its normalized scale (anxiety and avoidance in scale of -1 to 1) and the final class corresponding to the four attachment styles described in Section 2.1.
Sample of attachment measuring on participants (using questionnaire)
Subsequently, the autobiographies and experiences that were originally written by hand were transcribed into a text file, so that they could be preprocessed using the Perl programming language in order to preprocess the text for the facilitation of its subsequent analysis. This preprocessing refers to removing capitals, accents, and removing punctuation and articles, prepositions and pronouns. Except for this, the texts remained as they were originally written, maintaining even spelling and writing errors.
For the experiments described in this section we selected 160 autobiographies (∼80%) as training set and 42 (∼20%) as test set. We selected different randomly selected subsets from the dataset in order to implement 5-fold cross validation. The average number of words of autobiographies was 150, ranging from a minimum of 15, and a maximum of 640, with a standard deviation for the number of words of 83.46. The distribution of classes is shown in Table 5. Thus, the simplest baseline consists on always selecting the most frequent class (fearful). This baseline appears as the last row of Table 6 for comparison.
Independent probability of classes
Accuracy summary for WSM features.
We conducted two groups of experiments. The first group consisted on classifiers based on WSM. The second group is based on deep learning methods. Detailed results on both approaches are given in the following subsections.
Table 6 shows results for four different classifiers and the current baseline. This baseline was calculated by selecting the most frequent class for each attachment style. Because Multinomial Naïve Bayes yielded the best results for this task, we continued with this configuration in the following experiments. Table 7 shows our results for predicting the attachment style using single words (unigrams) considering all attributes (see section 3.1.2), then we applied attribute selection on these unigrams. The following rows in Table 7 show the resulting accuracy from combining different n-grams (unigrams, bigrams and trigrams).
Accuracy summary for WSM-ngram features. Attr.Sel: Attribute selection, used to reduce feature dimensionality. All features are boolean. No lemmatization was performed. Highest accuracy is shown in bold
Accuracy summary for WSM-ngram features. Attr.Sel: Attribute selection, used to reduce feature dimensionality. All features are boolean. No lemmatization was performed. Highest accuracy is shown in bold
In order to analyze the contents of autobiographies that correspond to each kind of attachment, we observed the characteristic n-grams for each class, that were calculated by applying attribute selection on the biography dataset, and weighting them by a multinomial Naïve Bayes classifier. Table 8 shows the contribution of each word to each class. Words appear as they were written. For example, nisiquiera (not even) is incorrectly written without a space—it should be ni siquiera, while tonterias (nonsense) is lacking the diacritical mark (tonterías). The highest probabilistic contribution of each word to a class is shown in boldface.
Probabilities of words given the attachment type
Probabilities of words given the attachment type
We can see, for example, that grandmother, additionally and noteven contribute mainly to the preoccupied attachment style. This can be interpreted that family figures, and a particular writing style of structuring argumentation additionally, noteven are more prevalent for this class. fun and social (unsurprisingly) characterize the secure attachment style. special, interesting and nonsense contribute to the dismissing class. This may indicate judgment of the writers on outside subjects, being more independent in the sens that they are able to qualify things by themselves. Finally, although the fearful attachment style seems to be evenly characterized by all words, is the absence of words with a higher probability for another class, or the presence of a four digit number, probably representing a year, that allows a subject to be classified as fearful.
Table 9 shows the most relevant bigrams for classification. Generally speaking, structures that are related to describing places, such as here in, well no, things have, from (national) state, (I) had it as well as family my grandmother are relevant to the preocuppied class. Structures related to social interactions (in his/her), current occupation (study in), and certain positive emotion when describing facts (a great, and in) are related to social attachment style. Telling about their own age and things that always happen to them characterize a fearful attachment, and finally the dismissing class is weakly represented by the for the structure—and mostly selected when other features do not appear in text.
Probabilities of words given the attachment type. S:Secure, F:Fearful, D:Dismissing, P:Preoccupied
For these experiments, we converted each word of the autobiography to a representation of a 300-dimension vector. This was achieved by using Word2vec [31] trained on the Spanish Billion Words Corpus [32]. We found that roughly 90% of the words in our text autobiographies were found in this corpus.
In order to measure the convenience of using bilateral GRUs, we experimented with different recurrent networks for processing the codification of biographies. We changed the architecture shown in Fig. 3 by substituting the initial layer of two GRUs with a single GRU—that is, using only left to right codification, and then we experimented with a single LSTM as well. Additionally, we experimented with 3 and four layers in this architecture. The architectures with 3 layers used (300 50 4) neurons in each layer for non bilateral units and (600 50 4) for bilateral units. The architectures with 4 layers used (300 300 50 4) and (600 300 50 4) respectively. We experimented with 3 to 11 epochs for each architecture, and varying the sequence length used for the recurrent networks—the average length of short biographies (150) or its double (300), so that most of the words of each biography was covered. The accuracy of the best 20 configurations is shown in Table 10.
Accuracy for the best 20 Word Embedding-based network configurations. The sequence length used for training was the average length of short biographies (150), or twice this number (300)
Accuracy for the best 20 Word Embedding-based network configurations. The sequence length used for training was the average length of short biographies (150), or twice this number (300)
Additional parameters used were: learning rate: 0.001. For LSTM the cost function used was mean squared error. Following recent works [33–35], we experimented with small sizes of batches, and found that the best batch size for this problem was a mini-batch of 8 samples.
In Table 10 the best accuracy is shown in bold, corresponding to a BLSTM based network of 4 layers of (600 300 50 4) neurons respectively, 5 epochs and sequence length of 150 words. The best results were obtained around 5 and 7 epochs. After this, there was a decay in performance (positions 14 and 15 for BLSTM). It is also possible to observe that LSTM (Position 7) has a performance near to BLSTM (Position 5) when a double sequence length is used (300). The same for GRU and BGRU (position 17 and 12). In general, LSTM peformed better with 3 layers (Position 7) versus 4 layers (position 19). The same happened for GRU (positions 17 and 20).
In this work we presented the use of Computational Linguistics tools and deep neural networks to try to find a relationship between the words used to write brief autobiographical and experiential texts by individuals of each of the four types of attachment according to the classification made with the instrument of Frías. For this hypothesis we relied on the statement that personality classification can be predicted by analyzing short texts written by individuals. To look for the possible aforementioned relationship, we experimented with two different approaches for modeling linguistic features.
In the first approach, the individuals were classified according to one of the four attachment styles of the theory of Bartholomew and Horowitz by using different word space model features and several classifiers (Multilayer Perceptron, Naïve Bayes, Bernoulli Naïve Bayes, Multinomial Naïve Bayes). This classification was made directly based on the words that the participants used in their autobiographical texts. We experimented as well with n-gram features, ranging from unigrams (words) to trigrams, and explored combinations of all of them. The best result was obtained by using Multinomial Naïve Bayes trained on unigrams, bigrams and trigrams altogether, filtered with CFS attribute selection. Obtained accuracy was 0.4079.
For the second approach, we used word embedding features obtained from Word2vec trained on the Spanish Billion Word corpus, and experimented with different architectures. We varied the number of layers (three to four), and used different building units for the first layer of our network: LSTM, GRU, bilateral LSTM, and bilateral GRU, being these two latter the ones that yielded the best accuracy. Particularly, using a BLSTM of four layers of 600, 300, 50 and 4 neurons each, sequence length of the average number of words in biographies, attained an accuracy of 0.4596 in 5 epochs, later decaying to 0.4306 and 0.4316 in 6 and 7 epochs respectively. Bilateral GRUs obtained a second place with an Accuracy of 0.4316 in a similar configuration. Our proposed configuration of traversing biographies in both directions (left to right and right to left) showed a better performance than traditional LSTM or GRU-based methods, even with a comparable sequence length—that is, covering most of the autobiography.
Although word space models (WSM) were almost 5% below in accuracy for classifying personality attachment style with regard to word embedding based models, WSM have the advantage of being able to provide an useful analysis on the characteristics being used to classify each attachment style. We have presented in this paper a brief study on the n-grams that probabilistically contribute the most for each class, finding the use of words and writing style features consistent with the meaning of each personality attachment trait.
Attained accuracy does not allow us to conclude that collected biographies can entirely replace the psychological instrument applied to measure attachment style; however, we found a good insight that there are linguistic features (syntactic and semantic) that allow to estimate personality traits. As a future work, we plan to estimate the axes of avoidance and anxiety, in order to have a more precise prediction in terms of the position of a subject in the graph of attachment styles; that is, before classifying these values as belonging to one of four quadrants. This measure would allow us to identify cases that might be near the border of two attachment styles (for example almost between preoccupied and fearful), and then we could identify how far from this point our prediction is being calculated.
There is still room for improvement for word embeddings approach, for example, to try different methods instead of Word2vec embeddings. Another option is to experiment with different sizes for the initial representation (instead of 300).
Finally, a combined solution of both approaches could be explored in order to propose an ensemble method, when there are cases that one approach is able to better classify some instances than the other.
Footnotes
Acknowledgments
We thank the support of the Mexican Government and Instituto Politécnico Nacional (IPN): COFAA, EDI, Projects SIP 20190077 and SIP 20195886; and Consejo Nacional de Ciencia y Tecnología–CONACyT (SNI, RedTTL). Particularly, through FOINS 360, Problemas Nacionales 5241.
