Abstract
This study examines psychometric properties of the 6-item Nature-Relatedness Scale (NR-6) using Rasch analysis. Nature relatedness is an important indicator of several markers of human and ecological well-being, and the NR-6 is a common convenient measure. In addition, the NR-6 is an ordinal scale with limited suitability for use with parametric statistics due to inconsistency with fundamental measurement principles. The current Rasch analysis employed the partial credit model using a sample (n = 450) collected from the United States. Initial analysis showed local dependency, low reliability, and poor fit to the Rasch model. The best model fit (χ2[24] = 29.30, p = 0.209) and reliability (Person Separation Index = 0.74) were achieved through combining locally dependent items, reducing measurement error. These findings support the psychometric strength of the NR-6 given the use of the included ordinal-to-linear conversion table, providing improvement without altering the original scale format.
Introduction
Evidence in the recent decade points to the importance of nature relatedness to human well-being. Nature relatedness is defined as the cognitive, affective, and experiential connection to the natural environment (Mayer, Frantz, Bruehlman-Senecal, & Dolliver, 2009). A range of positive associations have been found between the construct and aspects of human well-being. These include lower trait cognitive and somatic anxiety (Lawton, Brymer, Clough, & Denovan, 2017), increased positive affect, life satisfaction (Sadowski, Böke, Mettler, Heath, & Khoury, 2020), psychological well-being (Redondo, Valor, & Carrero, 2022; Wang, Hong, Lin, Tsai, 2022), higher levels of physical activity (Puhakka et al., 2018), and a more diverse diet containing more fruits and vegetables (Miliron et al., 2022).
A recent investigation found the relationship that life satisfaction had with multiple mental health variables such as depression, anxiety, and stress was moderated by nature relatedness (Phuoc Nguyen & Nguyen, 2022). The importance to physical health was also demonstrated in a 2018 study showing a positive association of nature relatedness with a self-report health item that possesses predictive validity for health outcomes, mortality, and recovery from illness (Bowling, 2005; Dean et al., 2018).
There are a number of reasons why nature relatedness is associated with so many physical and psychological health outcomes. From an evolutionary perspective, humans possess an innate tendency to connect with nature because of its resources for survival (Ulrich, 1993), and nowadays nature often provides the backdrop for recreational activities and social gatherings (Chang et al., 2020). Moreover, nature can help to restore attention and other psychological resources, which are depleted in urban settings. Nature enables individuals to engage in effortless attention, which can mitigate and prevent stress and support recovery (Kaplan, 1995).
Studies investigating the effect of nature on psychological and physical health commonly employ the 6-item Nature-Relatedness Scale (NR-6), a 6-item Likert-type scale ranging from strongly disagree to strongly agree that assesses a participants self-reported perception of their connection to nature (Nisbet & Zelenski, 2013). The brief NR-6 provides an advantage over its longer versions as it can be applied to a wider range of research contexts. This is especially important in clinical settings or research areas testing preliminary associations of the importance of nature relatedness to unfamiliar areas. Recently the NR-6 scale was used to evaluate the effects of a nature walk intervention on rumination, a factor important to reducing clinical depression (Lopes, Lima, & Silva, 2020).
The NR-6 has had its psychometric properties analyzed by its creators under a classical test theory (CTT) approach, who found a unidimensional factor structure, as well as good internal consistency (Cronbach's α = 0.83), predictive validity with positive affect, and convergent validity to both the 21-item version and the inclusion of nature in self scale (Nisbet, Zelenski, & Murphy, 2009; Schultz, 2001). Recent psychometric investigations into the NR-6 have yielded positive results, with Cronbach alpha values ranging from 0.85 to 0.89, and a 2022 confirmatory factor analysis demonstrated clear unidimensionality and strong factor loadings (Aruta & Pakingan, 2022; Redondo et al., 2022; Richardson, Hussain, & Griffiths, 2018).
However, another recent investigation noted issues with reliability and measurement equivalence or item bias, and indicated that a multidimensional model had a superior fit (Luong, 2022). We suspect issues of local dependency may be the reason for these findings, and if this is the case, corrections through subtest modification will be sought. To assess these conflicting findings and improve the precision of the NR-6, a modern test theory approach, specifically Rasch analysis, can be used.
The lack of precision attributable to ordinal measures such as the NR-6 is problematic. Ordinal measures violate the fundamental assumptions of parametric tests and lack concatenability, preventing addition or subtraction, as well as valid comparisons within or between the group tested. Comparisons with interval/ratio level physiological measures such as heart rate, respiratory rate, skin conductance (voltage), blood pressure, or salivary cortisol (nmol/L) are also stymied when using ordinal raw scores (Allen & Yen, 2002; StuckiDaltroy, Katz, Johannesson, & Liang, 1996).
Furthermore, ordinal measures do not view contributions of information to the latent variable as different in importance, meaning that the differences between items are not weighted causing a reduction to coherence and accuracy when calculating total scores. For example, under classical theory, agreement to the item 4 “I take notice of wildlife wherever I am” is given the same weighing as agreement with the item 6 “I feel very connected to all living things and the earth.” The fact that item 4 should contribute less to the latent trait than item 6 is not conveyed in the final tally. Rasch analysis accounts for this as scores are a logarithmic function of an item's difficulty and a person's ability, inherently providing items with a weighting that is taken into account in the total score.
Rasch analysis is an iterative process characterized by breaking a scale down into several attributes and testing the key elements of its function: invariance of measurement between groups/group characteristics, a unidimensional factor structure, and equal units of measurement along the scale continuum (Tennant & Conaghan, 2007). Rasch analysis is a widely used technique in the field of education that is becoming increasingly popular in fields such as medicine, psychology, agriculture, public health, and more, for its utility to examine and enhance psychometric properties of ordinal scales (Hobart & Cano, 2009).
The main benefit of Rasch analysis over CTT methods is that if the data show an acceptable fit to the model, conversion tables can be created to transform ordinal scores to an interval level metric. This increases the accuracy of the measure and produces data that no longer violate the assumptions underlying common statistical tests such as t-test, analysis of variance, linear regression, and Pearson rank correlation. Norquist, Fitzpatrick, Dawson, and Jenkinson (2004) laid out the impact that ordinal-to-interval improvements made to a scale using Rasch analysis in terms of improved precision valued by researchers and clinicians.
Recent publications involving Rasch analysis have included these effective and clinically useful tables (Medvedev, Krägeloh, Titkova, & Siegert, 2018; Medvedev, Pratscher, & Bettencourt, 2020), largely due to their practical benefits (Árnadóttir & Fisher, 2008; Tennant & Conaghan, 2007), as well as the recommendation for inclusion by Leung Png, Conaghan, and Tennant (2013). The aim of this study was to examine the psychometric properties of the NR-6 and if possible improve its precision through the transformation of ordinal scores to interval-level data using Rasch model estimates. Production of a conversion table would enable researchers and clinicians to more accurately compare and correlate nature-relatedness scores to interval-level outcome data from fields such as psychology, health and medical science.
Methods
Participants
Data were collected by the Open Source Psychometrics Project (https://openpsychometrics.org/) in November 2018 and comprised 450 internet users from the United States, not balanced for age or gender. For our sample (n = 450), the mean age was 33 years (standard deviation [SD] = 14.37), ages ranged from 18 to 78 years, and gender consisted of 30.2% male, 65.8% female, and 4% other respondents. Sample size was appropriate for Rasch analysis of a 6-item scale as per recommendations for achieving proper item calibrations while minimizing type I errors (Finaulahi, Sumich, Heym, & Medvedev, 2021; Hagell & Westergren, 2016; Linacre, 1994). Ethics approval was obtained by the University of Waikato (FS2021-57) and the data set is freely available in the public domain.
Procedure
Participants made their way to openpsychometrics.org through google search mechanisms around terms such as “psychometrics,” “personality tests,” and “open-source psychology,” otherwise through a friend's referral, or social media. Respondents were not compensated or otherwise incentivized outside of their own curiosity to complete questionnaires.
Each respondent was made aware before the start of the survey that responses given could be included in data analysis, journal articles, or other research, and were asked to indicate if they had given honest accurate answers that may be used in research—answering no automatically removed that respondent from inclusion in the final data set (Open Psychometrics, 2018). Data were split into three age groups (18–25, 26–35, and 37–78 years) for later testing of potential differential item functioning (DIF).
Measures
The NR-6 is a 6-item self-report Likert-scale questionnaire measuring the latent trait “connectedness to nature” (Nisbet & Zelenski, 2013). The scale includes response options from 1 = strongly disagree to 5 = strongly agree. No items used reverse coding. The scale demonstrated strong reliability with the current data set: Cronbach's alpha of 0.81 and McDonalds omega of 0.83.
Data analyses
Descriptive statistics and Polytomous Rasch analysis of the NR-6 were completed using IBM SPSS v26 and RUMM2030 (Andrich, Sheridan, & Luo, 2009), respectively. For assisting in determining model choice of either the rating scale model (Andrich et al., 2009), or the partial credit model (PCM; Masters, 1982), a likelihood ratio test (Andersen, 1973) was carried out. Results showed a significant difference between response option thresholds across individual items; therefore, the unrestricted PCM version of the Rasch model was used for the current data set.
This study evaluated the overall Rasch model fit and assessed psychometric properties of the NR-6 through RUMM2030-specific descriptive statistics outlined by Wright, Linacre, Gustafson, and Martin-Lof (1994), as well as guidelines outlined by Tennant and Conaghan (2007). With regard to item targeting, an average item difficulty location of ±50 is the criterion indicative of a properly targeted sample. This was compared against the mean item difficulty location in our sample. Item and person fit residuals (standardized indicators of model fit) show perfect model fit with a mean of 0, and SD of 1. Typically, acceptable model fit is shown when infit and outfit Z values fall between −2.5 and +2.5, and be significant at the p < 0.05 level.
Disordered category thresholds are observed when item responses do not correspond with an increase/decrease in the level of latent trait, and are defined as the disordering of the substantive meanings of the categories (i.e., rating life satisfaction on a 0–100 scale). Disordered category thresholds were examined and if found were addressed by collapsing adjacent categories or removing items. DIF was assessed in relation to demographic factors. Reliability was tested using the Person Separation Index (PSI), with a minimum acceptable value of 0.70 for group use and 0.85 for individual use (Tennant & Conaghan, 2007).
Local dependency was examined through the residual correlation matrix and values exceeding the overall mean by >0.20 were assessed for removal or creation of a subtest (Christensen, Makransky, & Horton, 2016; Tennant & Conaghan, 2007). Unidimensionality was assessed through a principal component analysis of Rasch residuals a.k.a. a “Rasch Factor Analysis” (Bond, Yan, & Heene, 2021; Linacre, 1998; Wright, 1996) as well as an equating t-test. Unidimensionality is demonstrated by lack of significant differences between participants scores on items with high and low loadings on any factor except of the construct under measurement (e.g. nature relatedness). Our sample size meets the recommendation outlined by Linacre (1994). Ordinal-to-interval transformation tables were created for ease of use for practitioners.
Results
Model fit statistics are presented in Table 1. Initial analysis indicated a poor overall fit to the Rasch model as well as low reliability, as evidenced by a significant item-trait interaction chi-square (p < 0.01) and a PSI value of 0.61 (Table 1). Person fit measures were not a cause for concern in our initial analysis (mean [M] = −0.25, SD = 0.75) and stayed satisfactory through our second and final analyses (Table 1).
Summary of the Rasch Model Fit Statistics for the Initial and Final Analysis of the Nature-Relatedness Scale (n = 450)
PSI, Person Separation Index without extremes; SD, standard deviation.
Item fit residuals are presented in Table 2 and showed poor model fit with two misfitting: item 2 (2.54) (“My connection to nature and the environment is a part of my spirituality”) and item 5 (−3.90) (“My relationship to nature is an important part of who I am”). Other notable values included item 1 (2.10) (“My ideal vacation spot would be a remote, wilderness area”) and item 3 (−2.20) (“My connection to nature and the environment is a part of my spirituality”). The sample mean (2.24; SD = 1.59) was higher than the item mean indicating that the majority of the samples have higher levels of the latent trait. Although the sample abilities were well covered by item thresholds, there were signs of a ceiling effect suggesting that easy items may be over-represented in the scale.
Initial and Final Item Fit Statistics of the Nature-Relatedness Scale (n = 450)
Significant misfit p < 0.05.
Disordered thresholds were apparent and are shown in Figure 1, where category 2 is not functioning correctly and never becomes modal. No evidence for DIF was found for either age or gender. Strong evidence for strict unidimensionality was shown in the results of Smiths' unidimensionality test (Principal Components Analysis of residuals and equating t-tests), and this was confirmed for all three analyses conducted (initial, second, and final) (Table 1). Local dependency was identified as an issue, with examination of the residual correlation matrix showing residual correlations between items 1, 2, 3, and 4 that exceeded the magnitude of 0.20, indicative of local dependency (Christensen et al., 2016; Tennant & Conaghan, 2007).

Item characteristic curve for item 1 before (above) and after (below) rescoring.
Modifications were made in the form of super-item 1 by combining items 1 and 6, and super-item 2 as a combination of items 2 and 3. This was carried out by evaluating conceptual meaning and size of residual correlation (Lundgren-Nilsson et al., 2013; Sandham et al., 2019). In addition, the Likert stem response options 2 and 3 were combined (they, respectively, correspond to response options 1 and 2 in Figure 1A, as RUMM2030 automatically converted response option 1 into 0, option 2 into 1, option 3 into 2, and so on).
After modifications, our final analysis showed no signs of local dependency, in addition to a satisfactory overall model fit and reliability as evidenced by a nonsignificant item-trait interaction chi-square of p = 0.209, and a PSI value of 0.74 (Table 1). This improved PSI reflects the scale acceptability for use in group assessment. Satisfactory model fit was also shown through item fit residuals, which improved to no misfitting items, with figures ranging from −1.80 to 0.87 (Table 2, Final analysis).
One minor drawback was a slight worsening in sample targeting with super-item 1 and super-item 2 falling marginally outside the +50 range, and item 4 worsening from −1.31 to −1.58 (Table 2). The disordered thresholds previously shown in Figure 1A were improved largely as shown in Figure 1B. This strategy resulted in the best model fit without removing items.
Figure 2 shows the person-item threshold distribution of the modified NR-6 after ordering the disordered thresholds and combining locally dependent items into two super items (Table 2, modified). The item thresholds and person ability levels on the latent factor measured by the NR-6 are plotted using the same metric in logit units. Distribution of persons is skewed to the right, indicative of a ceiling effect.

Person-item threshold distribution for NR-6 (Final Analysis).
Examination of the item map (or “Wright map”) shown in Figure 3 shows super-item 1 and super-item 2 stacking, indicating they possess a closely similar level of difficulty as they are targeting identical areas of person ability. Items 5 (ST004) and 6 (ST003) are shown below super-item 1 and super-item 2 on the x-axis, displaying a lower level of difficulty than both super items.

Item threshold map showing item locations after modification.
Ordinal-to-interval conversion tables are provided in Table 3, a key point for practitioners and researchers enabling a conversion of ordinal raw scores to interval level score in either logit or scale units. Those wishing to use the NR-6 to collect data and apply this conversion table do not need to alter the original layout of the NR-6, or do any reverse coding as the NR-6 does not require this. Researchers will only need to re-score responses to all items (1 = 1, 2 = 2, 3 = 2, 4 = 3, and 5 = 4), calculate the raw score total and then use the table provided to convert the raw scores to logits or scale units.
Converting from Ordinal-to-Interval-Level Scores for the Nature-Relatedness Scale
Note: This conversion table can only be used for respondents with no missing data.
Discussion
This study was undertaken to investigate the psychometric properties of the 6-item nature-relatedness scale using Rasch analysis. Results revealed a unidimensional factor structure, no evidence of measurement inequivalence, or item bias by gender and age. Person fit statistics showed no significantly abnormal response patterns. However, issues of local dependency, poor reliability, and poor global and item model fit were found.
Specifically, local dependency was found between individual items (Item 1 “My ideal vacation spot would be a remote, wilderness area,” item 2 “I always think about how my actions affect the environment,” item 3 “My connection to nature and the environment is a part of my spirituality,” and item 4 “I take notice of wildlife wherever I am”), and poor reliability was evidenced with a PSI of 0.61, which needs to be at least 0.70 for reliable group assessment (Fisher, 1997). Poor global model fit was shown with a significant item-trait chi-square interaction (χ2 = 98.49, p < 0.01), meaning that the scale was not working adequately at different levels of the latent trait. Item fit statistics found items 2 and 5 misfitting by a significant degree, with items 1 (2.10) and 3 (−2.20) displaying high fit residuals, introducing error variances unrelated to the latent trait.
Super-item modifications were undertaken in attempt to resolve scale dysfunction, in accordance with methods that aim to combine items based on residual correlations and conceptual meaning (Lundgren-Nilsson & Tennant, 2011; Sandham et al., 2019). Items 1 and 6 were combined to create super-item 1, and items 2 and 3, to create super-item 2. These item sets had shared a common error variance unrelated to nature relatedness, creating spurious correlations that negatively impacted reliability. The modifications resulted in the best model fit (χ2 = 29.30, p = 0.209) and enhanced reliability (PSI = 0.74), with no signs of local dependency, and no misfitting items.
Successful modifications satisfied Rasch model requirements, improving the reliability of the scale without deleting misfitting items, which were scarce enough in this already shortened scale. Furthermore, reducing items could potentially impact on conceptual importance reflected by such items, which would have a negative effect on construct validity (Finaulahi et al., 2021; Hopkins, Lyndon, Henning, & Medvedev, 2021; Medvedev, Turner-Stokes, Ashford, & Siegert, 2018). Without modifications, the NR-6 data would not fit the Rasch model well enough for our use.
To clarify, any latent measurement is never perfectly accurate to a theoretical ideal, we do not hope to see a perfect fit to the Rasch model, rather, we hope our data set fits the model enough to be useful for our purposes. Hence, the consequences of not creating super-item modifications would be that our data would not provide sufficient quality of measurement for the purposes of creating an interval level scale measuring nature relatedness.
Additional observations include a skewed distribution of responses to the upper end of the latent trait constituting a ceiling effect. This may be due to a social desirability bias given that it may be psychically painful or ego-dystonic to view oneself as having “rarely considered the environment in their actions” or to “not feel very connected with all living things and the earth.”
An additional factor that may be exacerbating this effect is the increased global media coverage of climate change and its associated emphasis on planetary health, environmental consciousness, as well as the moral and practical issues of pollution (Hase, Mahl, Schafer, & Keller, 2021). This observed ceiling effect should be considered in a future study with this scale, specifically with the formulation of future similar scales adjusting to capture those higher scoring respondents with more difficult-to-endorse questions.
Our modifications enhanced reliability of the NR-6 from an unreliable PSI of 0.61 up to an adequate level of 0.74, a level necessary to discriminate persons along the latent trait (De Ayala, 2013). This figure conveys that the NR-6 is acceptable for group use but not individual use, an expected result for a short scale such as this, as the overall number of items contributes a large part of reliability. Regarding unidimensionality, our conclusions contrast with those in Luong (2022), which suggested that a multidimensional factor structure is most appropriate. Strong evidence for strict unidimensionality observed here points to the NR-6 measuring a single, distinct, and overarching latent trait.
Ordinal-to-interval conversion tables are provided that enable interval-level data to be produced from ordinal raw scores, making data suitable for use with parametric statistics (Table 3). Converted data fall in line with both the fundamental principles of measurement and the assumptions of appropriate data in parametric tests as “ordinal data should not be used in parametric statistics as they imply a false knowledge of something more than a relative rank-order” (Stevens, 1946).
Improvements in the quality of data represent a necessary advancement for clinicians and researchers investigating nature relatedness as an important factor for human health. Recent investigations could have benefited greatly from this approach as several studies have compared nature relatedness with interval-level health outcome data using the NR-6. In 2021, Morris et al. (2021) evaluated the role of a nature-based intervention to improve the well-being of cancer patients and compared NR-6 data with physiological outcome measures such as aerobic fitness, flexibility, hand-grip strength, and blood pressure.
Similarly, a cross-sectional study investigating the relationship between nature relatedness and problematic smartphone use compared smartphone use with NR-6 ordinal raw scores. Although in this case smartphone use data were self-reported, screen time use statistics are increasingly being sought to gain more accurate and objective readings that do not suffer from recall bias (Richardson et al., 2018).
With greater precision, these comparisons with interval-level data can be made coherent. The call has been made for the NR-6, due to its brevity, to be included in clinical research in testing associations with health behaviors, factors of the skin, oral and gut microbiome, rates of noncommunicable diseases, allostatic load, chronic low-grade systemic inflammation, gene expression, transcription factors, and vitamin levels (Craig, Logan, & Prescott, 2016). Through interval conversion, such associations can be meaningfully examined, and improvements to scale accuracy are an essential foundation to future piloting, trialing, and evaluating new nature relatedness-based mental and physical health interventions, therapies, or care strategies.
Limitations
Our data set was limited in broad generalizability because it comprised respondents entirely from the United States, and did not include data on ethnic groups. Therefore, even though there was no evidence of DIF by age or gender, factors such as ethnicity, language, and nationality were not tested, and may show items functioning differently for different groups. In addition, a ceiling effect was observed in the analysis, as well as a dearth in responses to response option 2, leading us to collapse that option. Psychometricians looking to augment or create new or similar scales can utilize these considerations to remedy limitations in their own investigations. Further research can explore the multilingual multinational empirical constitution of the NR-6 and we would encourage researchers to this end.
Conclusion
Nature relatedness may be an important factor for various research designs aimed at improving physical and psychological human health, as well as our ecologically sustainable behaviors. The NR-6 is a useful short assessment that works best when survey administration is constrained by time, and can, therefore, be included into a wider array of adjacent research projects in different fields. Our article investigated the psychometric properties of the English language NR-6 in a population of adult respondents from the United States.
Results support the psychometric properties of the NR-6 scale after minor modifications as evidenced by unidimensionality, satisfactory fit to the Rasch model, and no significant local dependency. The ceiling effect found in this study directs further research toward expansion of the scale to target higher levels of the latent trait by including items of higher difficulty.
