Abstract
Many cases of violence against children occur in homes and other close environments. Machine leaning is a novel approach that addresses important gaps in ways of examining this socially significant issue, illustrating innovative and emerging approaches for the use of computers from a psychological perspective. In this paper, we aim to use machine learning techniques to predict adolescents’ involvement in family conflict in a sample of adolescents living with their families (community adolescents) and adolescents living in residential care centers, who are temporarily separated from their families because of adverse family conditions. Participants were 251 Spanish adolescents (Mage = 15.59), of whom 167 lived in residential care and 84 lived with their families. We measured perceived interparental and family conflict, adolescents’ emotional security, emotional, cognitive, and behavioral immediate responses to analog interparental conflict (IPC), and adolescents’ sociodemographic variables (i.e., age, gender). With a prediction accuracy of 65%, our results show that adolescents in residential care are not at greater risk for involvement in family conflict compared to adolescents living with their families. Age and gender are not salient predictive variables. We could identify that responses to analog IPC, adolescents’ emotional security, triangulation in IPC, and the presence of insults or blame during family disputes predict adolescents’ involvement in family conflict. These results point to variables with a potential predictive capacity, which is relevant for research and intervention.
Introduction
According to UNICEF’s data, 90% of incidents of violence against children and adolescents occur in their homes or other proximal environments. Violence against children is a social problem that impacts children and adolescents in significant ways, making prevention of violence a priority (Hillis et al., 2016).
Family matters are private and it is expected that undesirable family processes are underreported (Shehan & Greenstein, 2014). Still, in Spain, official statistics report that 6532 children were victims of intrafamily violence in 2018 (Caravaca & Teruel, 2020). These figures have shown a tendency to increase yearly and, additionally, there are some gender differences. More girls than boys are reported as victims of intrafamily violence, especially in the age range of 14–17 year olds 1 . This suggests that age and gender are important variables that should be taken into account. In fact, adolescence is typically characterized by an increase in intrafamily conflict compared to previous developmental stages (Dittman et al., 2020. Family conflict can be defined as an active opposition between family members (Marta & Alfieri, 2014) that may lead to intrafamily violence (e.g., when recursive and escalating destructive conflict happens in the family). Family conflict is one of the risk factors that predicts adolescents’ involvement in antisocial behaviors (López-Larrosa & Rodríguez-Arias, 2012) and it is related to unhealthy relational patterns in adulthood (Heinze et al., 2020). Due to the increasing interest in predicting risks for children and adolescents who are involved in violent circumstances at home, the aim of this paper is to use machine learning techniques to predict adolescents’ involvement in family conflict.
We intend to identify predictors of involvement and no involvement in family conflict in a sample of adolescents living with their families (community adolescents) and adolescents living in residential care centers (RC) by measuring dimensions of interparental conflict (IPC) and family conflict, adolescents’ emotional security, emotional, cognitive, and behavioral immediate responses to analog IPC and adolescents’ age, gender, and rural/urban background. RC adolescents are a vulnerable population, and their responses to IPC have not been addressed often; furthermore, findings on the effect of adolescents’ interference in IPC and family conflict have been inconsistent. Our method aims to identify variables and procedures that may help to simplify the evaluation of risks for involvement in family conflicts. We focus on a sample of adolescents from community and residential care centers and, using machine learning algorithms, demonstrate that automatic methods can achieve reasonably good risk prediction performance. Furthermore, the analysis of the most effective models reveals ways to simplify future studies.
Background and Previous Work
Interparental Conflict
Interparental conflict (IPC) can be operationalized as differences of opinion or disagreements between parents (Cummings & Davies, 2010). Interparental conflict is one of the conflict types that may happen in the family system (Marta & Alfieri, 2014). IPC can be examined and categorized along two main dimensions: process and effects on children. Interparental conflict can be deemed destructive or constructive by examining the processes and strategies involved in the disagreement. Destructive IPC is characterized by escalated, unresolved disagreements between the parents, and can include verbal or physical aggression. In contrast, constructive conflict is characterized by positive affection, calm discussion, problem solving strategies, and resolution of the conflict (McCoy et al., 2009; López-Larrosa, Sánchez-Souto et al., 2019).
Additionally, previous literature has also identified some IPC as unresolved, which occurs when partners leave the situation without reaching a resolution about their disagreement. This type of conflict resembles destructive conflict due to its lack of resolution; however, unlike destructive conflicts, unresolved conflicts may cease to escalate when the partners leave the situation (Cox et al., 2001; Gomulak-Cavicchio et al., 2006; López-Larrosa, Sánchez-Souto et al., 2019).
In this study, we will differentiate among destructive, constructive, and unresolved IPC when examining their impact on adolescents in the family. As for its effects, interparental conflict can impact children and adolescents in the short term and the long term. According to Emotional Security Theory (EST) (Cummings & Davies, 2010; P. Davies & Cummings, 1994) and its new formulation (EST-R) (Davies & Sturge-Apple, 2007), children and adolescents have a fundamental drive to feel safe and secure in their families. Specifically, EST-R posits that through evolution, we have developed a social defense system to identify and respond to potential social threats, including threats from the family, and IPC can be identified as one of these possible threats (Davies et al., 2013). When adolescents witness IPC, their regulatory processes activate as an immediate response to perceived threats. These processes involve emotional, behavioral, and cognitive responses (Cummings & Miller-Graff, 2015; Koss et al., 2011; Schermerhorn et al., 2019). These are IPC’s short-term effects. In the long term, recurrent exposure to destructive IPC affects adolescents’ emotional, cognitive, and behavioral responses to IPC (Cummings & Davies, 2010; López-Larrosa, Sánchez-Souto et al., 2019) which can erode adolescents’ emotional well-being and their sense of security in their families (Cummings & Davies, 2010; Davies et al., 2013).
Adolescents’ Responses to IPC and Interference in Family Conflict
Machine learning uses a wide range of algorithms and principles that increase confidence in findings, including the identification of key predictive items, variables and constructs that are indicators of risk for adolescents’ problematic involvement in family conflict. Accordingly, the use of machine learning may be of great interest to many, including computer-based and social science researchers and professionals providing clinical services. As we illustrate in this manuscript, machine learning merits greater consideration as computer-based algorithms that may advance understanding from a psychological perspective. Thus, this manuscript addresses a gap in the empirical demonstration of the use of computers from a psychological perspective.
Adolescents’ interference in destructive IPC is a defensive response to preserve their emotional security by interrupting conflict escalation through triangulation or physical involvement in the conflict (Davies et al., 2015). This response has two main consequences. In the short term, it may put adolescents at risk for violence in high-conflict homes because they may get hurt during conflict escalation. In the long term, adolescents’ involvement in family conflict will not reduce future interparental disputes (Warmuth et al., 2018), and the emotional investment involved in preserving family peace may translate into risks for psychopathology (Davies et al., 2015).
Studying adolescents’ immediate responses to IPC in a naturalistic setting poses ethical concerns about adolescents’ physical and emotional safety. Thus, researchers in the field have increasingly adopted the use of analog technologies by presenting video vignettes depicting simulated IPC. Video-recorded vignettes of IPC are among the most-used procedures, and they afford a high level of control over the stimuli while avoiding the ethical issues associated with observational or experimental studies (Shelton et al., 2006). In this study, we will consider emotional, cognitive (perceived degree of resolution of conflict), and behavioral responses to simulated (analog) IPC as predictive variables of adolescents’ interference in their families’ conflict.
According to EST and EST-R, adolescents can distinguish among analog destructive conflict (escalated), unresolved, and constructive conflict. As constructive conflict is less threatening than destructive or unresolved conflict, the social defense strategies activate differentially (Davies et al., 2013). Thus, destructive analog IPC elicits significantly more negative emotions, and it is perceived as significantly less resolved than constructive or unresolved analog IPC (López-Larrosa, Sánchez-Souto et al., 2019). Findings regarding the characteristics of behavioral regulation have been inconclusive. According to (P. T. Davies & Forman, 2002), high negative emotional reactivity relates to high levels of either involvement in IPC or avoidance. Some studies have found that adolescents are less prone to interfere in analog destructive conflict compared to constructive and unresolved analog IPC sequences (López-Larrosa, Sánchez-Souto et al., 2019). Other studies suggest that adolescents get involved in destructive IPC when their confidence in their parents’ ability to solve disagreements decreases (Goeke-Morey et al., 2013), yet others report that adolescents avoid destructive conflicts in general (P. T. Davies & Martin, 2014).
In line with EST, adolescents may feel either emotionally secure or insecure in their families. Emotional insecurity may manifest as preoccupation and anxiety about their family or disengagement in family processes (e.g., pretending that they do not care about the family) (Forman & Davies, 2005). Emotional security variables have been shown to correlate to IPC variables such as conflict properties, threat, and content of conflict. Thus, higher incidences of destructive conflict properties (i.e., conflict intensity, frequency, stability, or irresolution) are correlated with lower adolescent’s emotional security in the family. Conflict content predicts increased insecurity (preoccupation and disengagement) in the family (López-Larrosa, Mendiri et al., 2019). These dimensions also relate to adolescents’ interference in family conflict. Children and adolescents who experience high, frequent, and unresolved conflict tend to feel guilty, threatened, preoccupied, insecure, and feel less confident to solve IPC, leading them to interfere less often in family conflicts (Grych et al., 2004; López-Larrosa et al., 2012b; Rhoades, 2008).
Moreover, it is important to differentiate among the negative affects endorsed by adolescents following exposure to IPC, as adolescents who felt threatened by IPC tried to avoid conflict involvement; meanwhile, adolescents who felt guilty tended to get involved in family conflict (Shelton & Harold, 2008). Asides from characteristics of the conflict and adolescents’ emotional, cognitive, and behavioral responses to them, adolescents’ interference in family conflict also depends on age, gender and family type. Older adolescents tend to interfere more often (Davies et al., 1999; De Arth-Pendley & Cummings, 2002; Goeke-Morey et al., 2013; Shifflett-Simpson & Cummings, 1996), although other study has identified young adolescents as highly involved in their parents’ conflicts (Davies et al., 2015). Males tend to interfere more than girls do (Davies et al., 1999), but this pattern is reversed when conflicts are constructive in nature (López-Larrosa, Mendiri et al., 2019). Based on these discrepant findings on age and gender differences, our predictive analyses will include both variables as potential predictors of adolescents’ interference in family conflict. Regarding family type, it has been found to be associated with adolescents’ cognitive representations of analog IPC but it has been found to be not associated with the interference of adolescents in IPC. In (López-Larrosa, Sánchez-Souto et al., 2019) the authors found that adolescents from single parent families perceive constructive analog IPC significantly less resolved than adolescents from two parent families. In single parent families (due to parental divorce, compared to two parent families), adolescents significantly perceive that IPC is more stable and parents are less efficient in solving it (López-Larrosa, Sánchez-Souto et al., 2019). This may explain the adolescents’ perception of constructive IPC as less resolved. Post-divorce IPC has a negative impact on children and adolescents of divorced single parent families (Hayes & Birnbaum, 2020). Observe that the lack of differences in interference in IPC does not mean that there would also be no differences in interference in family conflict. Considering adolescents’ differences in the cognitive responses to IPC (depending on their family type) and the possible differences in responses to IPC and family conflict, we have also considered family type as a predictive variable of adolescents’ interference in family conflict.
We have not found references to the Spanish rural or urban background of adolescents and their involvement in IPC but there are some international evidences of the role that family conflict has on rural Latina children (Dixon De Silva et al., 2020), so the rural and urban background of our participants will also be considered as potential predictor of adolescents’ interference in family conflict.
Adolescents in Residential Care
In Spain, in the year 2019, there were 23,209 children and adolescents in residential care, of which, the highest percentage were adolescents in the age range of 11–14 (20%) and 15–17 (60%) (Observatorio Infancia, 2020). More specifically, in the region where this study was undertaken (Galicia) there were 1082 children and adolescents. However, Galicia’s official statistics do not report age groups and, thus, the actual number of adolescents in residential care is not publicly available (see the last available report at (Xunta, 2018)).
Most research on adolescents’ responses to IPC have studied community samples, that is, adolescents living with their families, while there is much less research devoted to explore adolescents in other contexts, such as those living in residential care (RC). In Spain, the majority of children and adolescents in RC (55%) are hardship cases (ex-lege) (Observatorio Infancia, 2020). RC adolescents are under the protection of local authorities because of disadvantaged family circumstances (Campbell et al., 2000; Del Valle & Bravo, 2013; Mäntymaa et al., 2012) that put them at risk for maladaptive biopsychosocial development (American Academy of Pediatrics, 2012). RC adolescents have been identified as being more emotionally insecure than community adolescents, and, on average, they have been exposed to higher rates of destructive conflict (López-Larrosa, Mendiri et al., 2019). Their emotional and cognitive responses to analog IPC seem to differ from those of community adolescents, mostly when they are exposed to constructive conflict (López-Larrosa, Sánchez-Souto et al., 2019). Although RC adolescents are temporarily separated from their parents, they may see their families occasionally while still in RC, or they may return to their family homes when they leave child protection (Atwool, 2013). Studies have found that destructive conflict increases in the family when adolescents enter RC centers (Mowen & Boman, 2018). Even though adolescents are not present during the conflict, these destructive conflicts affect parent-children relationships negatively and may make family reintegration more difficult (Mastrotheodoros et al., 2019).
Summing up, adolescents, compared to other age groups, represent a major group of users of residential care in Spain. Additionally, adolescents seem to be sensitized to IPC, suffer higher family conflict when they enter RC centers and they are developmentally closer to leaving child protection services. As a matter of fact, it is common that they return to their family homes (it happens in 52% of the cases (Campos, 2013)). This supports our choice to focus this study on this age group and, furthermore, the number of cases in our sample is representative with respect to the population of adolescents in Galicia’s RC centers.
Key Questions
In this study, the key questions are can we predict adolescents’ interference in family conflict by studying their responses to simulated IPC? Can other variables such as conflict dimensions or emotional security in the family predict adolescents’ interference in family conflict? Are RC adolescents in more danger to be involved in family conflict when they are with their families compared to community adolescents? Are age, gender, family type or location predictive variables of involvement in family conflict?
Methods and Material
Participants
Demographics of the participant sample.
Measures and Material
Children’s perception of interparental conflict scale (CPIC)
The CPIC assesses how children and adolescents perceive IPC in their families (J. H. Grych & Fincham, 1990; Grych et al., 1992). The original CPIC was translated and adapted to Spanish as “Escala de Percepción de los Hijos/as del Conflicto Interparental” (Iraurgi et al., 2008; Martínez-Pampliega, 2008), comprising 36 items. Participants indicate how well each item portrays their parents’ arguments. A 3-point Likert-type scale with values ranging from 1 to 3 (true, almost true, and false, respectively) is used. The items belong to nine subscales: Intensity (strength of the conflict, i.e., “When my parents have an argument, they yell a lot”), Frequency (conflict’s recurrence, i.e. “I often see my parents argue”), Stability (conflicts due to parents being unhappy together or lack of love, i.e., “My parents have arguments because they are not happy together”), Resolution (conflicts ending up with no solution, i.e., “My parents still act mean after they have had an argument”), Triangulation (feelings of being caught in the middle, i.e., “My mom wants me to be on her side when she and my dad argue”), Content (the theme or reason of the conflict, i.e., “My parents’ arguments are often about something I did”), Self-Blame (feeling responsible for the conflict, i.e., “Even if they don’t say, I know I am to blame when my parents argue”), Coping (feeling incapable of doing something when their parents argue, i.e., “I do not know what to do when my parents have arguments”), and Threat (feeling worried or scared for themselves or for their parents, i.e., “I get scared when my parents argue”) (López-Larrosa, Mendiri et al., 2019). In this study, the internal consistency of the subscales are α = .78 (Intensity), α = .84 (Frequency), α = .80 (Stability), α = .81(Resolution), α = .60 (Triangulation), α = .82 (Content), α = .74 (Self-Blame), α = .76 (Coping), and α = .74 (Threat). All CPIC items and the nine subscales will be used to predict adolescents’ interference in family conflict. In machine learning, most models assume redundancy. Redundant items or subscales are deleted when it is necessary to avoid filtering data aprioristically.
Security in the family system scale (SIFS)
The SIFS assesses adolescents’ perceived security in their families (Forman & Davies, 2005). The original scale has 24 items. We used the Spanish translated version (López-Larrosa et al., 2016). A 5-point Likert-type scale with anchors 1 (strongly disagree) to 5 (strongly agree) is used to answer each item. Items belong to three dimensions: Preoccupation (concerns about their future and their families, i.e., “I have the feeling that my family will go through many changes that I won’t expect”), Disengagement (disconnection from their families, i.e., “When something bad happens in my family, I wish I could live with a different family”), and Security (confidence in the family, i.e., “I feel I can count on my family to give me help and advice when I need it”). The three subscales comprise 20 of the 24 items (López-Larrosa et al., 2012b). In this study, the internal consistency of the subscales are α = .78 (Preoccupation), α = .78 (Disengagement), and α = .87 (Security). The SIFS′ 24 items and the three subscales will be used to predict adolescents’ interference in family conflict.
“How does my family behave when we have arguments” questionnaire
This questionnaire was created specifically to collect additional data about family conflicts. Adolescents are asked about the frequency and strength of conflicts in the family using a Likert-type scale with anchors 0 (nothing) to 3 (a lot) and about how long they have been experiencing conflicts in the family with values ranging from 0 (never) to 5 (forever). Then, they are asked who are involved in family conflicts using an open-ended question. They have to mark what happens during family conflicts and describe the conflict process by endorsing one of the following options: yells, insults or threats, mutual blaming, one always wins, etc. The next question explores what adolescents do in response to the conflict; in particular, the response option “I get involved” is our main dependent variable. The last question explores affection: how do adolescents show affection in their families (open-ended question) and how often do they show affection using a Likert-type scale with values from 0 (nothing) to 3 (a lot). All questions are then coded using binary values (1 if the option was selected and 0 otherwise).
Conflict vignettes
Eight visual conflict vignettes were created and edited (Sánchez Souto & López-Larrosa, 2016). The vignettes depicted short sequences of family conflicts about finances (vignette 1), leaving school (vignette 2), children’s curfew (vignette 3), children school problems (vignette 4), watching a particular television program (vignette 5), getting home late (vignette 6), washing the dishes (vignette 7), and in laws (vignette 8). The vignettes portray different conflicts to avoid adolescents’ satiation when they are watching the videos. Each vignette depicts a different heterosexual couple and comprises two parts: the conflict situation (one minute) and the ending (15 seconds). The endings are either constructive, unresolved, or destructive. Thus, for each of the eight vignettes, there are three possible endings. In the destructive ending, the conflict escalates with intense negative emotions and raised voices. In the constructive ending, the couple reaches an agreement and displays positive emotions and affection. In the unresolved ending, the conflict is unfinished as one partner leaves the scene.
“My opinion about the vdeo” (MOV) questionnaire
The MOV questionnaire (López-Larrosa, Sánchez-Souto et al., 2019) measures adolescents’ emotional reactivity (the intensity and the type of emotion generated), internal cognitive representations of the constructiveness of the situation (how resolved the conflict was), and behavioral regulation (what they would do in a similar situation) for each of the conflict vignettes they are shown. Emotional reactivity, cognitive representations and behavioral regulation are calculated for each set of constructive, destructive, and unresolved conflicts showed to each participant.
To evaluate emotional reactivity, participants are asked to identify their emotions after watching each video. There are positive emotions such as “happy” and “feeling well,” and negative emotions such as “angry,” “scared,” and “sad.” Once participants have identified their emotions, they rank the intensity of that positive or negative emotion, using a Likert-type scale with anchors 0 (Nothing) to 10 (Very High). Each participant completed a measure of positive emotional reactivity and a measure of negative emotional reactivity for each vignette. Total positive emotional reactivity is calculated by summing the scores for each pair of constructive, destructive, or unresolved conflicts that the participant has seen (see Section 3.4 Procedures). Total negative emotional reactivity is calculated by summing the scores for each pair of constructive, destructive, or unresolved conflicts. Values range from 0 to 20 for both positive and negative emotional reactivity.
In order to measure cognitive representations of conflict resolution, participants are asked “is the problem resolved?” A Likert-type scale with anchors 0 (Nothing) to 10 (Very High) rates the degree of resolution of the conflict situation from “not resolved” to “very highly resolved.” Constructiveness is characterized by a high degree of resolution, and destructiveness is characterized by a low degree of resolution. The cognitive representation of conflict resolution for each pair of constructive, unresolved, and destructive conflicts are summed up and range from 0 to 20.
In order to identify adolescents’ behavioral regulation and following the MOV protocol, participants are asked “What would you do if you were in the same room with them?” There are two distinct behavioral responses: “leave the room” and “get involved/interfere in the conflict.” Values range from 0 (Leave) to 10 (Get Involved). Scores near 0 reflect a relative likelihood of choosing to leave, and scores near 10 indicate a relatively greater likelihood of getting involved. Adolescents rate their likelihood of actions in the continuum of leaving the room to getting involved in the conflict. The behavioral responses of each pair of constructive, unresolved, and destructive conflicts are summed up and range from 0 to 20. Thus, the closer the value is to 0, the more likely they would leave the room, and the closer it is to 20, the more likely the adolescents would interfere.
Procedures
The Department of Psychology of the first author’s University initially approved the study. After that, the Department of Family, Children and Demographic revitalization (Dirección Xeral de Familia, Infancia e Dinamización Demográfica) of the local Government (Galicia) granted permission for researchers to undertake data collection. Once the study was approved by the local authorities, the principals of the residential care centers gave their permission to contact adolescents residing in those centers.
Adolescents were informed of the study once principals gave their permission, and those who assented participated. The community subsample of adolescents currently living with their families were contacted through schools. Principals and teachers were consulted for approval. In those schools where data collection was approved, parents received informed consent letters to give their permission to contact their children. Only adolescents living with their families who assented and whose parents consented were able to participate in this study.
The same researcher (the second author) was present to supervise and collect the data. All participants completed the CPIC and SIFS first. Then, they answered the “How does my family behave when we have arguments” questionnaire. Then, they were presented the first of the six video-recorded conflict vignettes and answered the MOV for that vignette. Each participant saw six different vignettes of the eight possible vignettes. The six vignettes were chosen for each participant using automated randomization. Two vignettes showed a destructive ending, two showed an unresolved ending, and two showed a constructive ending. Only one version of each vignette (either destructive, unresolved or constructive) was shown to each participant. As topics varied across videos, the aggregated results should be more generalizable to conflicts commonly seen in naturalistic settings. All vignettes’ ends were counterbalanced, but the last one to be shown was always constructive to reduce harms and risks to the participants (see (López-Larrosa, Sánchez-Souto et al., 2019)). Participants were instructed to view the vignette, respond to the MOV, and the procedure was repeated for each vignette. Participants viewed the vignettes alone in a facility room provided by a school or a residential care center.
Calculation: Data Preparation and Predictive Algorithms
The main challenge consists of building predictive technology based on the available variables (e.g., demographic variables and responses to questionnaires). From a machine learning perspective, we considered a two-class classification problem in which the target variable (“I get involved,” from “How does my family behave when we have arguments” questionnaire) is a binary variable that encodes the behavior of the adolescent when there is a conflict in the family. More specifically, this variable is set to 1 when the adolescent states that he/she tends to interfere in the conflict (and 0 otherwise). Such interference represents a risk for the adolescent, and it is important to develop automatic methods to predict such interaction between the adolescent and other family members. As described above, we obtained data from 251 adolescents. However, three participants did not provide a response for the question associated with the target variable, and another adolescent provided no responses for any of the questions related to the conflict vignettes. We therefore removed these four participants and focused on the remaining 247 adolescents (144 of them have the target variable set to 0 and 103 of them have the target variable set to 1).
Predictive Variables
Predictive Variables.
Variables derived from the “How does my family behave when we have arguments” questionnaire (section 3.3).
Variables derived from the MOV questionnaires (section 3.3.2).
Following standard practice, most categorical predictors were converted into dummy variables (binary values) that represent each possible level of the original variable. Some variables have unanswered responses and, in such cases, we often opted to include an additional dummy variable to represent the lack of answer. For missing data in numerical variables, we imputed the null values with the most frequent value (mode) across the records with non-null answers. The original set of predictors had 120 variables which, after dummification, led to a set of 230 predictive features.
Feature Selection
The aforementioned set of features or predictors contains a large set of variables, but some of them may be irrelevant, redundant, or have little importance in the final model. Feature Selection is a core component in Machine Learning technology as it can hugely impact the performance of predictive models. In this section, we explain the steps taken to remove the irrelevant or less important predictors that do not contribute much to our target variable.
First, we removed near zero variance predictors. These predictors either have one unique value (i.e., their variance is zero) or have the two following characteristics: they have very few unique values relative to the number of samples, and the ratio of the frequency of the most common value to the frequency of the second most common value is large. This resulted in the removal of the variables positiveemotion12, positiveemotion34, and some dummy variables associated to some levels of the variables bornSpain, fatherSpain, motherSpain, liveswith, other, resembleswho, and affectionfrequency.
Second, we removed correlations between the predictors. Many predictive models benefit from reducing the level of correlation between the predictors. To meet this aim, we employed the findCorrelation function of R’s caret package. This function computes pair-wise correlations, and when two predictors have an absolute value of correlation higher than a given cutoff (0.9 in our case), the function looks at the mean absolute correlation of each predictor and removes the predictor with the largest mean absolute correlation. This resulted in the removal of daysaweekwithfamily and 35 dummy variables associated to some levels of the variables liveswith, residentialcare, liveswithfamily, othersarguing, motherarguing, and several CPIC items (4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 34, 35, 36).
Last, we computed linear dependencies (using QR matrix decomposition) between the remaining features, and we also ran univariate filters using ANOVA. None of these methods removed any other predictor. As a result of this feature selection process, we obtained a dataset with 169 predictors.
Predictive Models
In this section, we describe the learning models used for this two-class classification problem. All of them (and the feature selection steps described above) are available from R’s caret package (Kuhn, 2008): • K-nearest Neighbors (KNN) (Dasarathy, 1991) classifiers are non-parametric learning methods that do not fit a model. Given a test case, the k training cases closest in distance to the test case are calculated, and the test sample is then classified using a majority vote among its k “neighbors.” It has a single parameter, k (the number of neighbors). • Näive Bayes (NB) (Duda & Hart, 1973) is a traditional probabilistic classifier based on Bayesian rule, assuming that the predictors are independent. Despite this assumption, NB classifiers sometimes outperform more advanced alternatives. The parameters are FL (Laplace smoothing parameter, which smooths the probability estimates and helps to reduce the variance of the resulting model) and usekernel (chooses between kernel density estimation and normal density). • Support Vector Machines (SVM) (Burges, 1998) are well-known learning methods that try to find the optimal frontier between the two classes. The optimal boundary is the hyperplane that maximizes the margin amongst the cases belonging to each class. To this aim, SVM computes the “support vectors,” which are those training cases that lie close to the decision boundary. SVMs utilize kernel functions to handle problems that are not linearly separable. These kernels are devices that project the original space to a higher dimensional space. The kernels tested were polynomial (parameter: p, degree of the polynomial), Gaussian radial basis (parameter: σ, width of the Gaussian) and linear (i.e., no kernel). Another key parameter of SVMs is C, the regularization parameter, which allows the handling of non-separable problems by relaxing the constraints. A large C gives high penalties to classification errors at the training stage, while a small C is more flexible on errors. • C5.0 is an evolution over C4.5. (Quinlan, 1993). It takes a training set of cases and grows a tree where each leaf (final node) is a decision (a specific value for the target variable), and each internal node (non-final) represents a test involving some predictor. The tree is built by iteratively splitting the training set into smaller subsets. At each node, the algorithm chooses the predictor that most effectively divides the data (with regards to the target variable). The splitting method utilizes Information Gain. C5.0 employs boosting and is able to estimate the importance of the predictors (through a method known as winnowing, which is especially useful at high dimensional spaces). C5.0 has the following parameters: trials (number of boosting iterations), winnow (whether or not to filter irrelevant features) and model (defines the type of output, which can be either rules or tree). • Random Forests (RF) (Breiman, 2001) are ensemble methods that build multiple decision trees at the training stage. To this aim, they employ several bootstrapped samples of the training set. The prediction of the target variable for a test case results from combining the individual predictions of the decision trees. To grow each individual tree, a random sample (with no replacement) is extracted from the training cases (bagging). RF “de-correlates” the trees by applying random predictor selection (the predictor that is used to partition the data at each node of the tree is selected amongst a random subset of predictors). RF has a single parameter, mtry, the number of predictors that are sampled randomly at every node. • AdaBoost (Freund et al., 1999) repeatedly applies weak learners on the training data. The algorithm maintains a set of weights associated to the training cases. Initially, all weights are the same. After each round, the weights of the cases that were incorrectly classified are increased. In this way, in the next round each learner is forced to focus on the misclassified cases. Based on prediction accuracy, a confidence score is assigned to each learner. Test cases are classified through a weighted combination of the predictions of all the learners. AdaBoost has the following parameters: iter (number of iterations), maxdepth (maximum depth of the trees—decision trees are used as weak learners-), and v (learning rate parameter). • Stochastic Gradient Boosting Models (GBM) (Friedman, 2002) are also ensemble learners that focus on the instances that are hard to predict (and often employ trees as weak learners). After each round, the distance between the prediction of the weak learner and the correct outcome is used to represent the “error rate” of the learner. These errors are subsequently used to calculate the gradient, which is employed to find the direction in which to change the model’s parameters in order to reduce the error in the next round. The model has the following parameters: ntrees (total number of trees to fit), interactiondepth (maximum depth of each tree), and shrinkage (a learning rate parameter applied to each tree in the expansion, where a smaller learning rate typically requires more trees). • Partial Least Squares (PLS) (Wold, 1975) performs dimensionality reduction by transforming the original space of predictors into a new subspace that supports prediction of the target variable based on a small number of predictors. To this aim, it finds a linear subspace of predictors that maximizes the covariance with the target variable. The derived directions are orthogonal. It has a single parameter, ncomp (the number of dimensions in the reduced subspace). • Linear Discriminant Analysis (LDA) (Fisher, 1936) is a well-known classification method that assumes that the classes come from normal distributions (with the same covariance matrix and the same prior probabilities). LDA computes the linear combination of predictors that maximizes the between-class variance relative to the within-class variance. In this way, it guarantees maximal separability. The method has no parameters. • Shrinkage Discriminant Analysis (SDA) (Ahdesmäki & Strimmer, 2010) is particularly useful when we cannot employ LDA because the number of predictors is large relative to the number of cases. SDA computes a regularized estimate of the within-class covariance matrix. To this aim, it employs James-Stein shrinkage rules for training the classifier and a variance-correlation decomposition of the covariance matrix. It utilizes correlation-adjusted t-scores (cat) for selecting predictors. Such an approach implements an effective ranking of predictors, even in the presence of correlation. A false non-discovery rate thresholding technique eliminates predictors that are not useful for distinguishing between the two classes. The parameters are the regularization parameter, λ (λ= 0 means no shrinkage, while λ= 1 leads to complete shrinkage), and the diagonal parameter, which determines whether or not the t-scores are employed to rank the predictors. • Flexible Discriminant Analysis (FDA) (Hastie et al., 1994) applies multivariate linear regression on the target variable. To meet this aim, it constructs a response matrix formed by two columns (one per class) in which a case has a value 1 in the i-th column if the case belongs to the i-th class, and 0 otherwise. Next, the LDA solution is obtained from linear discriminant analysis of the values fitted by multivariate regression analysis. The method has two parameters: degree (maximum degree of the regression model) and nprune (maximum number of terms in the model).
Experiments and Results
The classification strategies described above are examples of supervised learning, which is the machine learning task of inferring a function from labeled training data. In supervised learning, it is crucial to assess the predictive capability of the models with data other than those used to build the model. The aforementioned classification approaches were therefore evaluated following a 4-fold cross validation (CV) approach. CV repeats the optimization and validation process with multiple train-test splits. In this way, CV guarantees that all the available cases are included in the test split at least once. In our experiments, we employed a 4-fold CV approach (4 repetitions, where each train split consisted of 75% of the cases and each test split consisted of 25% of the cases), repeated the 4-fold CV process five times, and reported the average performance.
Accuracy of different two-class classifiers.
Confusion matrices of the most effective models. The figures reported are averages over the 4 folds.
Importance of the Predictive Variables
In many applied domains, it is important to not only have an effective predictive model, but also an interpretable model. In our case, knowing which predictors are more important in determining the target variable has a number of advantages. First, it may help researchers and professionals to identify crucial variables, which can improve data selection methodology. Second, from a practical perspective, it can simplify the way in which researchers and professionals capture data. For example, non-relevant questions could be eliminated from existing questionnaires. Another interesting outcome of this analysis could be that only the most effective dimensions are measured in future studies; for instance, only the most effective types of video vignettes will be shown to adolescents. Such simplification techniques are appealing, as the adolescents that we target are sometimes under distress, and it is important both ethically and functionally to reduce the effort they need to put into psychological studies. Although most of the predictive algorithms described above are not directly interpretable, we can still employ some analytical tools to estimate the importance of the predictors. To this aim, we have followed (Kuhn, 2007), who defined a number of metrics that estimate variable importance.
Variable importance estimates.
Meaning of the most important SIFS and CPIC variables.
This analysis reveals a number of interesting patterns and results. Adolescents’ gender and type of residence (living or not with family) is predictive of involvement in family conflict only in model-independent estimations while adolescents’ gender is predictive of involvement in model-independent estimations and RF. In any case, neither of these variables are among the top predictive variables. Several predictors associated with the SIFS and the “How does my family behave when we have arguments” questionnaire are regularly at the top positions. This suggests that the SIFS and the “How does my family behave when we have arguments” questionnaire may be regarded as solid tools to anticipate adolescents’ involvement in family conflict. With regard to the SIFS, the most predictive items are SIFS 1, 3, 11, 14, and 22. SIFS 3 and 14 belong to the Preoccupation subscale, while SIFS 11 and 22 belong to the Disengagement subscale. The SIFS1 variable (see Table 8) was removed from previous studies (Forman & Davies, 2005; López-Larrosa et al., 2016) but our current experiments show that it is a top predictor for the classifiers. All SIFS subscales (Preoccupation, Disengagement and Security) are at the top of the predictive dimensions. As for the “How does my family behave when we have arguments” questionnaire, it seems that the presence of insults, the strength of the arguments, the “whenconflicts” variable that refers to how long conflict has happened in the family, the “onealwayswins” variable that refer to one family member always winning the fights, and an obvious variable such as “mearguing” (the adolescent is the one that argues) are the top predictive variables of adolescents’ involvement in family conflict.
Interestingly, analyses revealed that the video vignettes themselves are also useful predictors of adolescents’ interference. The hypoteticalbehaviour and negativeemotion variables, which are obtained after the adolescents watched video vignettes of simulated IPC, are important in terms of prediction capabilities. The behaviors that adolescents endorse when watching either destructive (12 endings) or constructive conflicts (56 endings) are relevant predictors of involvement in family conflict, together with negative emotions when watching destructive, constructive and unresolved conflicts. On the other hand, the predictors associated with the CPIC scale seem to be less influential. One single item (CPIC19) from the triangulation subscale is located in the top predictive positions in the model-independent estimation, and the Threat subscale is in the top predictive position in the PLS estimation.
Explaining the Predictions
To further understand the reasons behind predictions, we employ the Local Interpretable Model-agnostic Explanations (LIME) proposed by (Ribeiro et al., 2016). This approach provides useful insights into the predictive models. LIME, which can be applied to any regression or classification model, represents an attempt to make predictive models at least partly understandable. It lies on the assumption that every predictive model is linear on a local scale and it builds simple fits around individual cases. These fits mimic how the global model behaves at a local level. Essentially, the prediction function of the model is approximated by locally fitting linear models to permutations of the original training data. On each round, a linear model is fit, and a weighting approach performed on the incorrectly classified cases permit the computation of how much and in which way each predictor contributes, approximately, to the decision of the model.
As argued above, the SDA classifier is the best performer for our tasks, and, thus, this analysis focuses on SDA. LIME allows us to explain the decisions for each individual test case, and we show here six test cases that are classified by SDA as “I do not get involved” (target variable=0, “no involvement decision”), and six test cases that are classified by SDA as “I get involved” (target variable=1, “involvement decision”). These instances are shown in Figures 1 and 2, respectively. For each case, the bar graph shows the most important predictors (from top to bottom). To plot these graphs, we chose to show only the top 10 predictors. These predictors closely match with the top predictors shown in the rankings presented in Table 7 (e.g., insults, mearguing). Explanation of predictions for 6 cases where the target variable is predicted to be equal to 0 (“I do not get involved”). Explanation of predictions for 6 cases where the target variable is predicted to be equal to 1 (“I get involved”).

The blue (positive) bars represent the fact that the condition associated with the predictor supports the decision of the classifier. All dummy variables have the form name_of_variable.level. For example, area has two possible responses (urban—1 or rural—2) and, thus, rural.2 is the binary variable that represents whether or not the respondent comes from a rural area. The top row of Case 1 in Figure 1 is “onealwayswins.1 = 0.” This is the dummy variable associated with one of the responses of the question exploring what happens when there are conflicts in the family from the “How does my family behave when we have arguments” questionnaire (onealwayswins.1 set to 0 means that the adolescent did not mark the response “one always wins”). The fact that “onealwayswins.1 = 0” has a long blue (positive) bar means that this is evidence to support the decision of the classifier (which, in this case, is “no involvement”). This perfectly fits with the fact that “onealwayswins.1 = 0” is evidence contradicting the “involvement decisions” in Figure 2 (see the negative red bars for this variable in Figure 2).
Following a similar line of reasoning, we can observe that insults during arguments support the likelihood of making an involvement decision (insults.1=1 has positive bars in Figure 2 while insults.1=0 has positive bars in Figure 1). If more than one person argues, then the classifier tends towards an involvement decision (onlyoneargues.1=0 is positive evidence in Figure 2 but negative evidence in Figure 1), and the analysis of the variable “mearguing” reveals that adolescents that declare to participate in the arguments (mearguing.1=1) are often cataloged as involvement cases.
The hypothetical behavior variables are numerical predictors that range from 0 to 20. They represent the response of the adolescent to hypothetical situations shown in the video vignettes. A low value means that the adolescent tends to leave from the conflict, while a high value means that the adolescent declares that he or she would interfere. The classifier properly encoded this information. For example, a low value of hypotheticalbehaviour56, which evaluates adolescents’ intended behaviors after watching constructive simulated conflict (5 and 6), favors a no involvement decision (hypotheticalbehaviour56 ≤ 10 supports the no involvement decision in Figure 1, while hypotheticalbehaviour56 ≤ 10 contradicts the involvement decision in Figure 2). This suggests that adolescents’ self-reported hypothetical non-involvement after watching constructive simulated conflicts does predict actual non-involvement in family conflict, giving weight and value to adolescents’ self-reports in evaluation of conflict involvement. According to Figure 1, adolescents’ non-involvement in family conflict is predicted by their reported intended non-involvement in simulated constructive conflict vignettes (hypothetical behavior 56 ≤ 10), when their mothers or fathers do not expect them to get involved (triangulation) (CPIC 19.2 = 0, CPIC31.2 = 0), when there are no insults (insult.1 = 0), when not always one wins during conflict interactions (onelawayswins.1 = 0), when there is no blame involved (blame.1 = 0), when they do not show affection often in the family (affectionfrequency.2 = 0), when disagreements are not too strong (strengtharguments ≤1 = 0), when they do not agree with being able to guess what family members are going to do (SIFS1 ≤ 3) and if they are from rural areas. According to Figure 2, the involvement of the studied adolescents in family conflict is predicted by their reported intended involvement in simulated destructive conflict (15 < hypotheticalbehavior12), by them not seeing their parents having arguments often (CPIC14.2 =0), when their mothers want them to get involved (CPIC 19.2 = 1), when more than one is involved in the argument (onlyoneargues.1 = 0), when they are part of the conflict (mearguing1 = 1), when there is blame involved (blame1 = 1), when there are insults (insult1 = 1), when conflicts in the family are fairly strong (2 < strengtharguments), when they show each other affection quite often (affectfrequency 2 = 1), and if they are from urban areas (are 2 = 0).
Discussion
According to previous data, there is some inconsistency in the prediction of adolescents’ involvement in family conflict. We have analyzed variables that explore adolescents’ perceived IPC and family conflict, emotional security, and cognitive, emotional and behavioral responses to family conflict and simulated IPC. We were also interested in exploring whether adolescents in RC were at greater risk for involvement in family conflict and in evaluating any potential differences by age, gender, family type, or location.
Combining information from our predictive models and LIME, our results show that RC adolescents do not seem to be at greater risk for involvement in family conflict compared to adolescents living with their families. These results are in agreement with the lack of differences between community adolescents and adolescents in RC in their involvement in analog IPC (López-Larrosa, Sánchez-Souto et al., 2019).
Age, gender, or family type are not the top predictive variables of involvement. Unexpectedly, other sociodemographic variable that refers to adolescents attending schools in rural or urban locations (area), has been shown to have predictive value. Adolescents from rural schools tend to not interfere in family conflict in cases 1 to 6 of Figure 1 but interfere in cases 17, 20, 30, and 38 in Figure 2. An explanation to this result is merely speculative.
The emotional and especially the behavioral reported responses to simulated IPC seem to be good predictors of involvement and non-involvement with an interesting differential pattern distinguishing between constructive and destructive conflict. Thus, self-reported hypothetical non-involvement in simulated constructive conflict predicts non-involvement in actual conflict, while self-reported hypothetical involvement in simulated destructive conflict predicts involvement in family conflict. These results support the use of analog technologies as a tool for research and intervention and seem to agree with the claim that social defense strategies activate differently for constructive and destructive conflicts (Davies et al., 2013).
Emotional security dimensions (SIFS dimensions) and SIFS specific items (for instance, SIFS 1 or SIFS 3) are at the top in the predictive models, which supports the emotional security construct and EST (Cummings & Davies, 2010). CPIC items such as CPIC 14, CPIC 19, and CPIC 31 are significant predictors for involvement, which stress the deleterious role of triangulation, that is, adolescents feeling that they have to take sides and the frequency of IPC (López-Larrosa, Mendiri et al., 2019; Bresin et al., 2017). The characterization and properties of family conflicts, such as the presence of insults or blame, the pattern of conflict (e.g., only one wins) and the presence of affection, are also significant predictive variables of adolescents’ involvement or non-involvement in family conflict (Grych et al., 2004; López-Larrosa et al., 2012b; Rhoades, 2008; Shelton & Harold, 2008).
Conclusions
Given the social concern about children and adolescents being physically and emotionally at risk for violence in the family (Hillis et al., 2016), the development of resources and technologies that may be used to inform prevention are of great interest. Machine learning illustrates the innovative use of computers from a psychological perspective. As we have shown, machine learning engages a wide range of algorithms and principles that increase confidence in findings, including the identification of key predictive items, variables and constructs that are indicators of risk for adolescents’ problematic involvement in family conflict. Machine learning thus has advantages compared to other extant statistical approaches: allowing for the use of many potential predictive variables and dimensions in order to identify the most predictive factors, also identifying risks for involvement for participants that may be missed by current social science approaches. Thus, in this manuscript we have used machine learning technologies to identify predictive variables of involvement in family conflict along with predictive variables of non-involvement, and this information may be useful for prevention (Hillis et al., 2016) and research. It can also be used to simplify instruments for an initial screening of risks and to identify potential protection.
One limitation of this study is that we may have used a scale of involvement in family conflict instead of a dichotomous variable, also, our models have a moderate predictive capacity, but human beings are complex, and many variables may operate in a particular circumstance leaning the scale to either getting involved or not getting involved in family conflicts. Still, this technology is promising for reasons already mentioned, and, in future research, machine learning may help to refine the combination of variables that operate together to either predict involvement or non-involvement and to explore other predictive dimensions.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially supported by projects PLEC2021-007,662 (MCIN/AEI/10.13,039/501100011033, Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación, Plan de Recuperación, Transformación y Resiliencia, Unión Europea-Next GenerationEU), RTI2018-093,336-B-C21 & RTI2018-093,336-B-C22 (Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación & ERDF). The fourth and fifth authors also thank the financial support supplied by the Consellería de Educación, Universidade e Formación Profesional (accreditation 2019–2022 ED431G-2019/01 and GPC ED431 B 2019/03) and the European Regional Development Fund, which acknowledges the CITIC Research Center in ICT of the University of A Coruña as a Research Center of the Galician University System. The third author also thanks the financial support supplied by the Consellería de Educación, Universidade e Formación Profesional (accreditation 2019–2022 ED431G-2019/04, ED431 C 2018/29) and the European Regional Development Fund, which acknowledges the CiTIUS-Research Center in Intelligent Technologies of the University of Santiago de Compostela as a Research Center of the Galician University System.
