Abstract
With increasing mammogram rates, identifying attributes of non-attending women entails going beyond differences in demographic groups to reveal complex interactions among personality attributes. In this study, we analyzed survey data from 474 women aged 41 years and older using decision trees. By incorporating personality, religiousness, and age, we were able to correctly classify 42.9 percent of non-attenders compared to 4.4 percent with logistic regression analysis. Our findings suggest that incorporating personality and religiousness attributes may increase non-attender identification. Furthermore, the simple profile generated by decision trees provides a clear map useful for intervention planning.
Introduction
Although it has become normative for women in industrialized nations to have a mammogram every 1 or 2 years (National Center for Health Statistics, 2013), the United States strives to increase these rates by another 10 percent by 2020 (US Department of Health and Human Services, Office of Disease Prevention and Health Promotion, 2015). Thus, with so many women already screening, it is increasingly important to understand the attributes of women not having mammograms. In the past, targeting women not screening was typically done using demographics; however, to reach even higher levels of attendance, a better understanding of the contributions of personality and religiousness is crucial. Therefore, in this research, we developed models with personality and religiousness attributes using decision trees. By using decision trees, we allow complex interactions among personality attributes to be seen for the first time in mammogram attendance. As expected, decision trees incorporating personality attributes and religiousness outperformed all other models developed in traditional ways.
Baseline comparison: demographic attributes in mammogram attendance
In order to see the true advantages of using personality and religiousness in identifying women not having mammograms, a baseline comparison with demographics is necessary. Important demographic attributes linked with increased screening have historically included older age, higher education, positive family history of breast cancer, and usually non-Hispanic White race (e.g. Calvocoressi et al., 2005; Vernon et al., 1992, 1990). In general, attributes linked to screenings appear in a simple list-wise manner focusing on main effects, although a few studies hint to the complex relationship among demographic attributes (Rakowski and Breslau, 2004; Vernon et al., 1992). As noted by researchers, further work is needed to explore interactions among these demographic attributes and other psychological factors (Rakowski and Breslau, 2004).
Personality and religiousness attributes in mammogram attendance
While a better understanding of differences across demographic groups has been instrumental in raising screening rates, to target those not attending, models based on demographic attributes alone will not be sufficient. Various personality studies have implicated the contributions of future time-orientation, conscientiousness, low neuroticism, and low fatalism in mammogram attendance (e.g. Lukwago et al., 2003; Mayo et al., 2001; Schwartz et al., 1999). Although these trends are clear for individual personality traits, research is needed to determine what combination of these attributes underlies mammogram attendance.
Associations with religiousness attributes are less clear. For instance, loci of control may vary in religious women with some women having high internal control, some working in a collaborative relationship with God, and others having a passive locus where God is in control. Women with passive control, especially for health domains, are less likely to have a mammogram (Holt et al., 2007). Furthermore, weekly religious service attendance is associated with higher levels of screening compared with attending services more or less frequently (Salmoirago-Blotcher et al., 2011).
Finally, personality may interact with levels of religiosity. It is just as easy to imagine a highly religious future-oriented woman having a mammogram as it is to imagine a highly religious fatalistic woman not having a mammogram. Although a few studies have touched upon the complexities of religiousness in mammogram screening (Azaiza et al., 2011; Steele-Moses et al., 2009), a more comprehensive study examining personality and religiousness together is sorely needed.
Decision trees
Identifying non-attending women necessitates analyses sensitive to complexities. Decision trees are especially suited to exploring the interplay of attributes and are able to detect interactions and non-linear relationships (Breiman et al., 1984; Strobl et al., 2009). In this context, the goal of decision trees is to identify groups of women who vary in their likelihood of screening. Here, trees begin by splitting the full sample into smaller groups, or nodes, using whichever attribute makes women within the resultant groups more similar in their attendance behavior. When a group of women cannot be split further, it becomes a terminal node.
Based on these splits, complex relationships are transformed into simple profiles with clear rules resulting from each split which are particularly useful in applied settings (e.g. Calvocoressi et al., 2005; Demir and Kumkale, 2013; Freitas et al., 2012). Furthermore, unlike commonly used methodologies such as logistic regression, where all interactions and categorizations must be specified a priori for analysis, trees allow for any attribute combination as long as the resultant groups are more similar. Trees also determine an attribute’s optimal cut-point or grouping of categories. In addition to classifying individuals, tree analysis yields a misclassification error or proportion of the individuals who are classified incorrectly based on the model.
Present study
The goal of this research was to develop a better understanding of the attributes of women not following mammogram guidelines. Specifically, we used survey data from women aged 41 years and older (N = 474) to develop logistic regression models and decision trees using various personality- and religiousness-related attributes. We first looked at models using demographic characteristics to gauge what increase using personality and religiousness attributes might offer.
Methods
Sample
This study used prospective data from two modules of the National Survey of Midlife Development in the United States (MIDUS; Ryff et al., 2012a, 2012b, 2013). MIDUS is a longitudinal study of 7108 participants that began in 1995. Participants completing MIDUS II (2004–2006), including a special African-American sample from Milwaukee, were eligible to participate in subsequent modules such as the health-focused Biomarker 4 (2004–2009).
Our sample included women from Biomarker 4 (n = 713). Mammogram recommendations in 2001 began at the age of 40 years; thus, we excluded women under the age of 41 years to allow for first screening (n = 43). After the exclusion of women with a cancer history (n = 102), multiple women from the same family (n = 55), women with incomplete mammogram status (n = 5), and women with incomplete personality data (n = 34), the final sample contained 474 women aged 41 years and over.
Measures
Demographics
Participants answered questions regarding age, education, family history, and race (see Table 1 for all variables). Insurance status was not measured concurrently with mammogram status; therefore, available status information from MIDUS II was not used in the present analyses.
Descriptive statistics and logistic regression models of mammogram attendance.
SD: standard deviation; CI: confidence interval.
p ⩽ 0.05; **p ⩽0.01.
Personality attributes
Participants in MIDUS II self-reported their conscientiousness, neuroticism, fatalism, and future time-orientation using multi-item scales whose references, factor loadings, and construction details are reported in the MIDUS documentation manuals.
Conscientiousness was measured with five items indicating how hardworking, organized, careful, responsible, and thorough participants were (1 = Not at all, 4 = A lot; α = 0.66). Neuroticism was measured by asking participants to rate how moody, nervous, worried, and not calm they were (1 = Not at all, 4 = A lot; α = 0.74). Additionally, participants rated their level of fatalism with two items (α = 0.64): “I have little control over the things that happen to me” and “What happens in my life is often beyond my control” (1 = Strongly disagree, 7 = Strongly agree). Finally, participants rated three reverse-coded future time-orientation items: “I believe there is no sense planning too far ahead because so many things can change,” “I have too many things to think about today to think about tomorrow,” and “I live one day at a time” (1 = Not at all, 4 = A lot; α = 0.62).
Religiousness-related attributes
Religiousness attributes encompassed attendance, religiosity, and locus of control. Participants indicated their frequency of religious service attendance (Never to More than weekly) and how religious they were (1 = Not at all, 4 = Very). Finally, participants answered five questions to report their locus of control on three separate dimensions. They answered three items about passive locus of control: for example, “In your daily life, how often do you ask yourself what your religious or spiritual beliefs suggest you should do?” (1 = Often, 4 = Never; α = 0.78). Participants answered one item for the collaborative dimension: “I work together with God as partners” (1 = None, 4 = A great deal). Finally, the following item was used to measure internal health locus of control: “Keeping healthy depends on things that I can do” (1 = Strongly agree, 7 = Strongly disagree).
Mammogram attendance
While women provided all personality and religiousness information during MIDUS II, they answered questions about 2-year mammogram attendance information in the follow-up Biomarker 4 module. The dependent variable was coded as having or not having a mammogram within the last 24 months (1 = attenders vs 0 = non-attenders).
Statistical analyses
Attributes associated with having or not having a mammogram within the last 24 months were analyzed first with logistic regression (Table 1) and then with decision trees (i.e. classification and regression trees (CART); Figure 1). In both types of analyses, models using (a) demographic attributes, (b) personality and religiousness attributes, and (c) combination of all attributes were developed and compared. In order to compare the performance of individual regression models and trees, measures of non-attender and attender predictive ability were constructed (Table 2). Calculations were based on dividing the number of correctly predicted non-attending women identified in each model by the total number of non-attending women in the sample. Additionally, the same calculation was performed using correctly predicted attending women divided by the total number of attending women (see Table 2). Finally, random-forest analyses were conducted to verify the findings of the decision trees (Breiman, 2001; Strobl et al., 2009; Figure 2).

Mammogram attendance decision trees: (a) demographic tree, (b) personality and religiousness tree, and (c) integrative tree.
Comparison of analytic models predicting mammogram attendance.
This percentage was calculated by (a) dividing the number of non-attenders correctly predicted by the model by the actual number of non-attenders and (b) multiplying by 100.
This percentage was calculated by (a) dividing the number of attenders correctly predicted by the model by the actual number of attenders and (b) multiplying by 100.

Variable importance for mammogram attendance.
Results
Descriptive information
Mammogram attendance within the past 24 months was 81 percent (see Table 1). Women were predominantly non-Hispanic White (73%) with an average age of 57.3 years (standard deviation (SD) = 10.48). A total of 45 percent of the sample women had a college degree. Furthermore, they were particularly conscientious (range = 1–4, M = 3.45; SD = 0.47), religious (31% very religious and 47% somewhat religious), and high in internal locus of control (range = 1–7, M = 6.45; SD = 0.94).
Logistic regression
To examine attributes associated with having or not having a mammogram within the last 24 months, univariate and the following multivariate logistic regression models were conducted: (a) baseline demographic model, (b) personality and religiousness model, and (c) an integrative model (see Table 1). In univariate models, only age, conscientiousness, and future time-orientation predicted mammogram attendance: older (60–69 years: odds ratio (OR) = 2.07, p < 0.05; above 70 years: OR = 3.04, p < 0.05), more conscientious (OR = 2.00, p < 0.01), or more future time-oriented (OR = 1.59, p < 0.01) women were more likely to have a mammogram. These same attributes and relationships held in all multivariate models. Interestingly, religiousness attributes never made a difference in the models (see Table 1 for models, ORs, and significance values).
Overall, logistic regression models performed very poorly in predicting non-attenders: 0.0 percent with demographics, 3.3 percent with personality and religiousness, and 4.4 percent with the full model. Thus, these models were not fruitful in developing a better understanding of non-attending women. This finding justifies the use of decision trees.
Decision trees
We developed three different decision trees with attribute sets identical to the ones used in the logistic regression models. To avoid overfitting the data, we did not allow the trees to grow more than five levels and pruned them as recommended (Breiman et al., 1984).
Demographics decision tree
While the demographics-based logistic regression model did not predict any non-attender correctly, the decision tree correctly classified 17.6 percent of the non-attenders (see Figure 1(a)). In this tree, the first split separated less adherent women aged 45 years and younger from rest of the sample. The younger women (Node 1) were further split by education. Women aged 45 years and below without a college degree were the most vulnerable group screening at the lowest rates (48%). For younger women, education seemed to have a protective role (48% vs 82%).
Personality and religiousness decision tree
As in logistic regression, constructing a decision tree with personality and religiousness attributes revealed a significant association of only conscientiousness and future time-orientation (see Figure 1(b)). Where the personality and religiousness logistic regression model predicted only 3.3 percent of non-attenders correctly, 22.0 percent were correctly predicted with a simple tree. In this tree, present time-oriented women attended less than those with a more future time-oriented outlook (Node 1 vs Node 2). Further division among these present time-oriented women did not take place. For the less present time-oriented women, conscientiousness made a significant difference (Nodes 3 and 4). When women were not especially present time-oriented (Node 2), those low in conscientiousness screened less (57% attendance in Node 3 vs 83% attendance in Node 4).
Integrative decision tree
Entering all attributes at once as in the full logistic regression model led to an ultimate tree with nine clusters, shaded in gray, of women varying in their rate of attendance from 0 to 100 percent (see Figure 1(c)). In addition to three homogeneous or pure nodes (Nodes 3, 7, and 14), this tree contained two curvilinear relationships involving age and neuroticism. Besides neuroticism, the tree also identified passive locus of control as a predictor of screening, a relationship missed by logistic regression models. While the full logistic regression model predicted only 4.4 percent of the non-attenders, the integrative decision tree predicted 42.9 percent of the non-attenders and 89.6 percent of attenders correctly.
The first split took place with future time-orientation. For women very low in future time-orientation (Node 1), age was the most crucial factor for screening likelihood and exhibited a curvilinear relationship. Here, 0 percent of women aged 45 years and younger (Node 3) screened compared to 100 percent of the women aged 45–55 years (Node 7). Women over the age of 55 years had a screening rate of 56 percent which was more than the youngest group but less than the mid-age range (Node 8).
For women who were not extremely present time-oriented (Node 2), age was also important. For those under 60 years of age, conscientiousness made a significant difference (Nodes 9 and 10). For the conscientious group (Node 10), screening rates did not vary further as a function of other attributes, but neuroticism and passive locus of control made a difference in less conscientious women. Neuroticism, in particular, had an interesting curvilinear relationship (see Nodes 11, 15, and 16). Finally, adding passive God locus of control produced a pure node with 100 percent of women attending who were high on the attribute, under 60 years of age, not extremely present time-oriented, less conscientious, and lower in neuroticism (Node 14). This integrative decision tree was superior to the all other decision trees and logistic regression models presented.
Model comparisons
Next, we compared the performance of decision trees with logistic regression models in terms of predictive ability (Table 2). As can be seen, all logistic regression models, even those with interactions similar to decision tree structures, performed poorly in correctly identifying women not attending mammogram screenings. Overall, models incorporating personality and religiousness outperformed those based solely on demographics. Taking personality attributes into account significantly improved prediction of non-attenders identifying 22.0 percent without age and 42.9 percent when age was mixed with personality.
Validation of the decision tree: random forests
To assess the robustness of the final tree and to identify whether an important attribute or relationship might have been missed, we conducted supplementary random-forest analysis (Breiman, 2001; Strobl et al., 2009). A random-forest is simply a collection of trees. In a forest, not only are samples varied but also predictors are varied, allowing attributes that may be formerly overshadowed by stronger predictors to contribute. Consequently, random forests allow for a measure of variable importance across a multitude of trees. If an attribute consistently finds its place in trees, even when hundreds of trees are grown from varied parameters, there can be little doubt of its importance as a predictor. Thus, compared to logistic regression, it is harder to miss important predictors in decision trees and random forests.
Specifically, we developed a random forest of 1000 trees and calculated variable importance. Each tree had a maximum of five predictors and up to 10 levels. Parent and child node sizes were set at a minimum of 10 and 5, respectively, representing 2 and 1 percent of the sample. Variable importance was calculated using the percentage of times each attribute appeared as one of the five predictors within the resultant trees.
Within the 1000-tree forest (Figure 2), the first five most important attributes were identical to those in the full tree. Age was the most important attribute associated with mammogram attendance followed by conscientiousness, future time-orientation, neuroticism, and passive locus of control. Except for age, all other demographic attributes had weaker importance compared to personality attributes verifying the usefulness of predicting non-attendance from personality and religiousness attributes.
Discussion
This study addressed the question of mammogram non-attendance among women aged 41 years and over. Since having mammograms is now the norm, understanding attributes of women who do not screen has become critical. Unlike past research, which typically revolved around demographic attributes, this study examined the relevance of religiousness and personality attributes such as conscientiousness, time-orientation, and neuroticism. Using demographics as a baseline, as expected, focusing on personality attributes significantly improved identification of non-attending women.
Small differences were seen in the classification ability of logistic regression when adding personality and religiousness attributes to demographics. However, decision trees allowed for complex relationships, and by nature of these relationships, trees greatly increased identification of non-attenders. Where logistic regression only implicated age, time-orientation, and conscientiousness as important factors, tree analyses revealed significant relationships involving several attributes not seen in logistic models (e.g. whether or not one has college degree, level of neuroticism, passive God locus of control). Furthermore, decision trees identified two curvilinear relationships involving age and neuroticism while producing a simple diagram with distinct cut-points useful for interventions. These findings suggest that decision trees can be used in place of traditional classification methods such as logistic regression in this context—not only as a supplement to them.
The integrative decision tree did well in identifying groups of women varying in their level of attendance from 0 to 100 percent. Women in two groups screened at 100 percent. The first group consisted of women aged 45–55 years who were very present-centered. However, younger women with similar scores on time-orientation had 0 percent attendance. The second group with 100 percent attendance included women below 60 years of age who were not very present time-oriented, below average in both neuroticism and conscientiousness, and higher in passive God locus of control. The literature suggests that higher passive locus of control may act as an impediment to screening (Holt et al., 2007); however, decision trees revealed that women who were similar in all characteristics but lower in passive locus of control had the second lowest attendance rates of all groups. Perhaps passive locus is harmful only when other personality attributes are not considered.
One additional unexpected relationship emerged from the tree. Whereas neuroticism was expected to impede attendance, the integrative tree revealed a surprising curvilinear relationship where an optimal level of neuroticism was associated with higher attendance. This finding calls for further research as it seems to run counter to earlier findings on fear appeals (Witte and Allen, 2000).
Contributions, limitations, and applications
In exploring interactions among mammogram predictors with decision trees, this study showcases the importance of personality and religiousness attributes. Such interactions allowed counterintuitive relationships to emerge, not only showing passive locus of control to be a facilitator but also displaying an optimal amount of worry not previously seen.
With the current factors, the tree’s overall performance was far superior to logistic regression. This tree afforded identification of new predictors along with non-linear relationships. Furthermore, it contained three groups of women who were identical in their screening behavior (pure nodes; see Nodes 3, 7, and 14). Random-forest analysis supported the generalizability of the conclusions. Nonetheless, the sample size was relatively small (N = 474) with only 20 percent of women not attending (N = 91). Thus, these smaller pure nodes offer prime targets for future study. With increased sample size, future work analyzing the behavior and motivations in these specific women could no doubt add to mammogram attendance literature. Such research focusing on pure nodes of non-attending women would be even more fruitful in boosting mammogram attendance and reaching current national goals.
While MIDUS is a large dataset, variables such as mammogram knowledge, doctor recommendation, and screening history beyond last mammogram were not assessed. Whereas more global features like neuroticism and God locus of control were assessed in MIDUS, more specific measures like breast cancer fear and God locus of health control could aid in further understanding the mechanisms of mammogram attendance (Champion et al., 2004; Holt et al., 2007). Additionally, insurance status was assessed in MIDUS II with an average 27-month lag time between its measurement and mammogram status. Adding a concurrent measure of insurance status and other above-mentioned factors could only increase predictive ability.
While more factors could have been added, one strength of the study lies in the ease of attribute assessment. Besides age, only 15 questions are needed to construct the integrated decision tree (see online Supplementary Material). With the ability to assess psychological attributes with only a few items (Stephenson et al., 2003), the number of questions may be further reduced and thus could be easily incorporated into national surveys, online assessment tools, or patient intake forms.
A final contribution is the production of an easily applicable tree greatly improving the identification of non-attenders and providing avenues for future interventions that can be tailored around salient personality attributes positively associated with attendance. Physicians could identify women in their own practices benefiting from mammography and, using such models, predict the likelihood of screening focusing more effort on women in the least attendant groups. For instance, for those low-attending women who are very present time-oriented, interventions to increase future time-orientation or focus on long-term gains may be helpful (Nodes 3 and 8). Past research has shown that time perspective can be increased through interventions and increases in future time-orientation can lead to positive health behaviors (Hall and Fong, 2003; Marko and Savickas, 1998). Additionally, positive affect may assist in people thinking less about short-term costs and more about long-term gains, thus inducing a positive mood in these women and discussing long-term gains may make them more receptive to screening (Aspinwall, 2005). Due to time-orientation being implicated as an important factor in other cancer screenings such as cervical cancer, brief interventions making women less present time-oriented may have a spillover effect into other screening domains (Roncancio et al., 2014).
Furthermore, for low-attending women who are younger than 60 years of age, not very present time-oriented, and lower in conscientiousness, interventions focusing on fear and worry may assist in screening (Nodes 13 and 16). Fear and worry can be induced; however, an optimal amount of fear has never been identified (Witte and Allen, 2000). It is possible that optimal levels of fear, anxiety, and worry only exist in a subgroup of people and while others have too much worry or fear of cancer (Clarke and Everest, 2006). Thus, interventions elevating concern for breast cancer in women with lower neuroticism coupled with reassurance focusing on efficacy of screening tests may increase attendance. For women with higher levels of neuroticism, the same reassurance without increased threat may increase their coping abilities and motivate screening (Ruiter et al., 2003). Although the model indicates that women who allocate more control of their lives to God attend more than women who rely less on God, bolstering this construct may not work in women who are not religious, and until studied further, could adversely affect other health domains (Allen et al., 2014). Options such as these provide alternatives for interventions focusing on personality in efforts to increase not only mammogram screening but also other healthy behaviors.
Mammography has long been viewed as a proven tool for detecting breast cancer early when it is more treatable (Berry et al., 2005); however, recent reports have highlighted both the potential risk for over diagnosis of breast cancers not needing treatment and the possible lower estimated impact on breast cancer mortality reduction (Bleyer and Welch, 2012; Miller et al., 2014). Currently, even with these findings, large agencies still advocate for high levels of national screening (Smith et al., 2013; US Department of Health and Human Services, 2015). While blanket recommendations still prevail, trends are moving toward more individualized recommendations for mammogram attendance involving discussion of both benefits and risks of screening with healthcare professionals (Pace and Keating, 2014). With such a need for personalized recommendations, decision trees may be the right tool to aid customizing this information for individual women. Furthermore, collecting information on personality attributes in a medical setting as part of patient intake could be considered as routine as assessing other demographic factors related to health.
To meet future mammogram screening goals, new strategies which uncover attributes of women not having mammograms are necessary. These strategies include integration of both personality and religiousness attributes, as well as the identification of complex relationships. Despite this large task, the outcome must be easily interpretable and easily applicable in order to be useful. Profiles such as the onecreated here allow for the synergies between attributes of mammogram attendance while retaining utility and providing new intervention opportunities.
Footnotes
Acknowledgements
The authors thank Alper Sen, Lemi Baruh, and Zeynep Cemalcılar for their helpful comments on this paper.
Funding
This research was supported by TUBA-GEBIP funds from the Turkish National Academy of Sciences awarded to Tarcan Kumkale.
