Abstract
Mental health first aid (MHFA) courses teach community members the knowledge and skills needed to recognize and respond to mental health problems until professional help is received or the crisis resolves. This study aimed to develop a reliable and valid measure of MHFA behaviors. A pool of actions that were recommended and not recommended were selected from MHFA guidelines and developed into two scales measuring either intended or provided support. Items were tested with a sample of 697 adults. Item response theory guided the selection of final items. The Mental Health Support Scale (MHSS) Intended version has 23 items across two subscales and the MHSS–Provided has 12 items across two subscales. These scales demonstrated convergent validity, discrimination between respondents with and without MHFA expertise, and acceptable measurement precision across a range of skill levels. Overall, findings suggest that the MHSS is a valid and useful measure of MHFA behaviors.
Keywords
The mental health first aid (MHFA) program was developed in 2000 to empower community members to provide better initial support to people experiencing mental health problems or crises (Jorm et al., 2019a; Kitchener et al., 2017). MHFA training adapts a model familiar from physical first aid training for injuries and emergencies to mental health problems, whereby aid is provided until appropriate professional help is received or the crisis resolves. The training was developed in recognition that community members are likely to have contact with people in their social, work, and community networks with mental health problems due to their high prevalence, but that many lack the knowledge and skills in providing appropriate support. For example, in a suicidal crisis, it is common for members of the public to mistakenly believe it helpful to avoid asking about suicide or to remind a suicidal person what they have going for them, but these responses are not recommended by experts and are likely to be harmful (Nicholas et al., 2019). Other common harmful responses to people with a mental health problem are avoiding them, dismissing that their mental health problem is real or causes suffering, judging them for being weak, and treating them as incompetent and unable to make decisions (Morgan et al., 2017). Given the low rates of help-seeking for mental health problems (Wang et al., 2007) and that people are more likely to seek professional help when it is suggested by someone in their social network (Vogel et al., 2007), upskilling the community in how to recognize mental health problems and provide MFHA is an important facilitator of early intervention (Kitchener & Jorm, 2008). The MHFA Action Plan, known as ALGEE, summarizes the key actions taught in the course: Approach the person, assess and assist with any crisis; Listen and communicate nonjudgmentally; Give support and information; Encourage the person to get appropriate professional help; and Encourage other supports. The training curriculum is based on MFHA guidelines that were developed through expert consensus. These guidelines systematically incorporate the views of professionals, carers, and people with lived experience on how to best provide MFHA (Jorm & Ross, 2018).
The MHFA program has spread to more than 25 countries, and more than 4 million people worldwide have been trained in MFHA (Mental Health First Aid Australia, n.d.). The program is supported by at least 18 controlled trials and meta-analyses of its effects (Maslowski, 2019; Morgan et al., 2018). There is good evidence that the program increases knowledge about mental health problems and their treatments, reduces stigmatizing attitudes, and improves confidence and intentions to provide MFHA. Yet only a handful of randomized controlled trials (RCTs) have examined the impact of the training on the quality of MFHA provided to recipients, and these have shown mixed findings (Forthal et al., 2022; Morgan et al., 2018). Two RCTs found an increase in supportive help 6 months after training (Hung et al., 2021; Jorm et al., 2010). Another RCT in a workplace setting found no difference in the quality of help provided at 1-year follow-up, but a significant improvement at 2-year follow-up toward people outside of work only (Reavley et al., 2021). In contrast, an RCT with parents could not detect an improvement in the quality of help provided to their adolescents 1, 2, or 3 years after Youth MHFA training, although this study had limited power to detect effects (Morgan et al., 2019, 2020). This is a significant gap in the evidence-base supporting the program, as ultimately it is important to understand whether the training improves the quality of support provided and leads to better outcomes in recipients of aid.
Demonstration of the extent to which training improves the provision of support requires valid and reliable methods for assessing MHFA behaviors. The measurement of MHFA behaviors in evaluation studies to date has had significant limitations. The principal approach taken has been to collect reports of help provided through open-ended questions in questionnaires. The assessment of behavioral intentions (as opposed to reports of help actually provided) has relied on reactions to a vignette of a person experiencing a mental health problem, whereas for actual helping situations, respondents are asked what they did to help the person. Behavioral intentions are assessed because all participants can provide a response regardless of whether they have had an opportunity to provide actual help, and intentions predict actual behavior, both generally and for MFHA intentions specifically (Rossetto et al., 2016; Sheeran, 2002; Yap & Jorm, 2012). These free-text responses are then scored according to consistency with the MHFA Action Plan to derive an overall score. However, respondents typically give only brief, nonspecific responses, for example, only 16 to 17 words on average (Morgan et al., 2020). Because of this lack of detail, responses typically score artifactually low on MFHA intentions and actions. Although this measurement approach has good inter-rater reliability (Morgan et al., 2019; Reavley et al., 2018), there is likely to be bias from respondents not describing all of the actions taken to support someone, especially if the support was given some time ago. Indeed, respondents tend not to write about the A (approach, assess, and assist) component of the action plan, although this is an inherent part of providing other components of the plan.
Development of a reliable and valid measure of MFHA behaviors is an essential step toward improving the evidence-base of MHFA and understanding whether the training leads to better support provided to recipients of aid. There is therefore a need for a formal scale of MFHA behaviors that can be used in evaluations of MHFA courses. While techniques based on role-play have been developed to avoid reliance on self-report when assessing MFHA skills in pharmacy students (El-Den et al., 2021), this approach is limited to a specific context and would not be feasible in most MHFA trials as it is resource-intensive. A promising self-report approach is to create a scale of specific actions that cover the spectrum of actions from the MHFA Action Plan. This approach is likely to have less measurement error and greater validity than a free-text approach. This method was used in an Australian national survey on suicide first aid behaviors, which showed much higher correlations between the quality of intended support and the quality of support given, when using a scale of specific actions rather than a free-text approach (Jorm et al., 2018, 2019b). Associations were r = .44 to r = .54 for the specific actions scales (Jorm et al., 2018) but only r = .28 for the free-text approach (Jorm et al., 2019b), suggesting greater measurement error in the free-text approach.
A key consideration in developing an appropriate scale is to use a mix of recommended actions and actions that are not recommended but appear plausible, to increase sensitivity to change and reduce problems with acquiescence and social desirability bias. This would make it possible to demonstrate improvements in recommended actions and reductions in unhelpful or harmful types of support, consistent with the MFHA guidelines. Furthermore, a generic scale that could apply across different mental health problems would be most useful so that scores could be comparable regardless of the recipient’s mental health problem. This is appropriate because the MHFA Action Plan (ALGEE) can be applied regardless of the recipient’s particular mental health problem or crisis (Kitchener et al., 2017).
Aims and Hypotheses
Our aim was to develop a new scale of MFHA behaviors that could be applied to a range of mental health problems and MHFA encounters, and to test the psychometric properties of this newly developed scale. We named the scale the Mental Health Support Scale (MHSS), as although it was primarily designed for evaluation of MHFA interventions, it has a much broader potential use for measuring supportive behaviors. Validity was assessed by examining whether the scale correlated with related constructs, including knowledge of MFHA, mental health literacy, and stigmatizing attitudes. We hypothesized a positive correlation between scores on mental health support and MHFA knowledge and mental health literacy, and a negative correlation between mental health support and stigma. To examine divergent validity, we examined the correlation between mental health support and empathy, a related but conceptually different construct. We expected a positive correlation that was attenuated due to aspects of the MHSS that are unrelated to empathic responding (e.g., encouraging help-seeking). As evidence in support of scale validity, we also hypothesized that MHSS scores would discriminate between those with and without expertise in MHFA. Finally, we assessed reliability using test–retest correlation.
Method
MHSS Item Pool
A working group with expertise in MHFA course content or evaluation (including MHFA co-founder AFJ) selected potential items for inclusion in the MHSS. We examined findings from Delphi consensus studies used to develop the MFHA guidelines (Bond et al., 2019; Chalmers et al., 2020; Cottrill et al., 2021; Ross et al., 2014a, 2014b). These studies identified statements about first aid actions that were endorsed by panels of experts (lived experience, professionals, and carers). Most potential items were drawn from statements with endorsement rates above 90%, with some items above 80% endorsement. Items that did not describe a clear observable behavior were excluded. Items that could be completed from the perspective of both the first aider and the recipient were selected, to allow for a future scale on help received. Statements that were conditional (beginning with an if clause) were excluded, as we wished to only include items that could be applied to all helping encounters, for example, “If the person denies anything is wrong or does not wish to talk about what they are experiencing, the first aider should focus on listening rather than trying to change the person’s mind.” Statements that applied across different mental health problems were selected first. Items that applied to specific crises or scenarios were then selected for inclusion in specific subscales. Where necessary, items were rephrased for clarity while maintaining the meaning of the original statement. Items that were not recommended were selected from statements that advised against an action. Some of these statements required editing to increase plausibility, to avoid nonselection due to social desirability bias. For example, “The first aider should not belittle or dismiss the person’s feelings by attempting to say something positive like, ‘You don’t seem that bad to me’” was converted into the item “Try to cheer them up by telling them things don’t seem that bad.”
The pool of potential items was reviewed and refined by colleagues with expertise in MHFA for item clarity and coverage of MHFA actions. Items were then formed into two different scale versions: MHSS–Provided measured the quality of support provided by the respondent to another person in the past, and MHSS–Intended measured the quality of support the respondent would provide toward a hypothetical person with a mental health problem. Each scale comprised the same 42 items about specific MFHA actions, rated on a scale of Yes or No (Provided version) or a 5-point Likert-type scale for the Intended version, with 1 = very unlikely, 2 = unlikely, 3 = neither likely nor unlikely, 4 = likely, and 5 = very likely. Items that were not recommended were reverse-scored. There were also five supplementary subscales on help for suicidality, immediate risk of suicide, psychosis, aggressive behavior, and reluctance to seek help, comprising 5 to 10 items each. In the MHSS–Provided version, screening questions determined whether these subscales were relevant and should be completed by respondents. These screening questions were (a) When you were supporting that person, did you ever find out or suspect they might be thinking about suicide? (b) Was the person ever at immediate risk of suicide? (c) When you were supporting that person, did you ever find out or suspect they might be out of contact with reality, for example, experiencing delusions (false beliefs), hallucinations (seeing or hearing things that aren’t real), or paranoia? (d) When you were supporting that person, were they ever reluctant to seek professional help although their mental health problem was having a major impact on their life? (e) When you were supporting that person, did they ever behave aggressively toward you?
Participants
Two sources of participants were used—members of the general population and MHFA Instructors who served as a criterion reference group. Members of the general population were recruited from Prolific.co, a service that allows researchers to recruit participants to complete online surveys or experiments. Adults from high-income English-speaking countries were eligible to participate, including Australia, the United Kingdom, the United States, and Canada. These countries also have active MHFA programs and are therefore appropriate populations in which to test a measure of MFHA actions. To maximize the likelihood of higher quality responses, participants were limited to those who had an approval rate of 90% or above and had completed 50 or more studies on Prolific.
MHFA Instructors were recruited to test the measure in a sample of participants with a high degree of competence in MFHA support. MHFA Instructors undergo intensive training in the course and are required to regularly deliver courses and undertake continuing professional development to maintain their accreditation. MHFA Instructors were recruited from English-speaking countries with established MHFA programs (e.g., Australia, Canada, the United States, England, New Zealand). MHFA Australia approached international MHFA head offices in these countries and asked them to distribute a recruitment flyer among their instructor networks via email.
Measures
MHFA Knowledge
Knowledge about MFHA was assessed with nine questions based on the content of the MHFA course, for example, “If a person you think might be depressed does not want to seek help, it is important to force them if you can,” rated on a response scale of agree or disagree. The number of correct responses was summed. These items have been shown to discriminate between adults who have done an MHFA course and those who have not (Cutler et al., 2018). In this sample, McDonald’s omega = .70.
Mental Health Literacy Scale
Mental health literacy was assessed with a measure that combines recognition of mental disorder and beliefs about treatment effectiveness (Reavley et al., 2014). The measure includes a vignette describing a person with early schizophrenia, followed by an open-ended question on what is wrong with the person in the vignette and five questions on the helpfulness or harmfulness of different interventions. Scores can range from 0 to 6, with higher scores indicating greater mental health literacy. Scale validity is supported by a national survey of health professionals who served as the standard for determining correct responses (Morgan et al., 2013). Higher scores were also associated with greater exposure to mental disorders in self, friends, or family (Reavley et al., 2014). In this sample, McDonald’s omega = .62, with the average interitem correlation being .20, reflecting the broad nature of the construct.
Social Distance Scale
The Social Distance Scale of Link et al. (1999) measures the desire to avoid contact with a person with a mental illness and is often used as an indicator of stigma. It measures intended behavior and includes five items on desired social distance measured on a 4-point Likert-type scale (e.g., willingness to make friends with, or marry a person with a mental illness). Higher scores indicate greater social distance. The measure has shown excellent reliability with α = .88 in community surveys (Yap et al., 2014), and in this sample, McDonald’s omega = .93. Its validity is supported by evidence that people with lower social distance scores report more contact with people with mental disorders (Jorm & Oh, 2009).
Toronto Empathy Questionnaire
The Toronto Empathy Questionnaire is a 16-item measure of global empathy (Spreng et al., 2009). The items represent empathy primarily as an emotional process, for example, “I get a strong urge to help when I see someone who is upset.” Items are rated on a 5-point scale from 0 (never) to 4 (always). The measure has shown excellent internal consistency and test–retest reliability (Spreng et al., 2009), and in this sample, McDonald’s omega = .90. Its validity is supported by associations with other self-report measures of empathy and behavioral measures of interpersonal sensitivity (Spreng et al., 2009), and concordance between self-ratings and informant ratings (Roth & Altmann, 2021).
Sociodemographics
The questionnaire also asked respondents about their age, gender, highest level of education, language spoken at home, country of residence, and training in helping someone with a mental health problem.
Procedure
The study received ethics approval from the University of Melbourne Human Research Ethics Committee. Participants were directed to a questionnaire hosted by Qualtrics, where they read a description of the study and were asked to give informed consent. Participants completed the MHSS–Intended, the measures described above, and were screened for whether they had known someone with a mental health problem in the past 12 months and whether they had tried to help them. Participants recruited through MHFA head offices who screened positive to these two questions completed the MHSS–Provided in the same survey session. Participants recruited through Prolific who screened positive were invited to participate in another survey containing the MHSS–Provided measure only. This was the only method allowed by Prolific to recruit a sample who were eligible based on custom screening questions. A subsample of these participants were invited to complete both versions of the MHSS measure a second time after an interval of 2 weeks, to examine test–retest reliability. In the Prolific sample, the average survey length was 12 minutes for the MHSS–Intended survey (which included the validation measures and sociodemographics) and 4 minutes for the MHSS–Provided survey. The average survey length in the MHFA sample was 17 minutes for participants who did not complete the MHSS–Provided and 24 minutes for those who did.
Participants recruited from Prolific received a payment equivalent to US$10/hour for participating in each part of the study, whereas participants recruited through MHFA received no compensation. An attention check question (e.g., “If you are paying attention, please select Unlikely to this question”) was embedded in the questionnaire to avoid low-quality data from bots and “bad faith” participants recruited through Prolific. Participants who failed this attention check question were rejected for payment and their data were not included.
Statistical Analysis
Item response theory (IRT) was used to guide the selection of items to include in the final MHSS, with analyses conducted separately for MHSS–Intended and MHSS–Provided. IRT formally models the relationship between an item response and the underlying trait (theta) measured by the scale, in this case mental health support skill. IRT can be used in scale development to choose items that provide the most information and measurement precision across the desired range of skill (DeMars, 2010). The IRT models used in this study—two-parameter models—estimate the location of each item and how well each item discriminates between respondents with different skill levels. Difficulty parameters (denoted by b) refer to the skill level at which the probability of answering at or above the particular category equals 50%. Discrimination parameters (denoted by a) provide information on how well an item differentiates between respondents with different skill levels; items with higher discrimination levels contribute more to measurement precision than items with lower discrimination. Item information functions show how much information each item provides across the range of theta, with the population mean value of theta assigned zero and each unit of the x-axis is 1 SD of the underlying trait in the population. The item information functions are summed to produce the test information function, which shows how much information or measurement precision the scale provides across different skill ranges. The standard error of measurement is the inverse of the square root of information so that the greater the information, the smaller the standard error and the greater the reliability (DeMars, 2010).
As an initial step to reduce the item pool, we tested whether there was a significant association between sample type (MHFA Instructors and the general population) for each item. We wished to eliminate items that did not differentiate between a criterion-reference group (MHFA Instructors) and the general public. Next, the assumption of unidimensionality was investigated by graphing a scree plot of the eigenvalues of the polychoric or tetrachoric correlation matrix and conducting an exploratory factor analysis with the principal factor extraction method using the combined sample (DeMars, 2010). For the MHSS–Intended, a graded response model for polytomous items (i.e., those with Likert-type response scales) was fitted (Samejima, 1969). For the MHSS–Provided, we compared a one-parameter and two-parameter logistic model with a likelihood-ratio test to see which was preferred. Item characteristic curves and item information functions were plotted to examine item performance. We also examined plots of item fit for major discrepancies in the central area (±2 SDs). We aimed to select items with at least moderate levels of discrimination (values at or above 0.65; Baker, 2001) that provided nonredundant information across a range of mental health support skill (±2 SDs). We also aimed to select items from each component of the MHFA Action Plan (ALGEE), and that were the same across MHSS–Intended and MHSS–Provided. For instance, where there were multiple candidate items in the MHSS–Intended that showed a similar level of performance, we selected the item that performed the best in the MHSS–Provided. Finally, the IRT models were re-run on the final list of items and the test information function was plotted.
To test scale validity, associations between variables were examined with Pearson’s correlation coefficient or Spearman’s correlation coefficient where there was substantial skew. Independent-samples t tests examined the difference in means between those with and without expertise in MHFA, with Cohen’s d effect sizes reported. Test–retest reliability was evaluated by calculating intraclass correlation coefficients (ICCagreement). Analyses were conducted with Stata 16 using the irt commands.
Results
We received 471 responses on the MHSS from Prolific. Of these, seven did not complete all questionnaires, 12 failed the attention check question, and two withdrew their consent, leaving a sample of 450. There were 239 participants (53.1%) who reported knowing someone well who had developed a mental health problem or crisis in the previous 12 months, and they were invited to complete the MHSS–Provided. We received responses from 209 participants, but nine did not complete the questionnaire and one person failed the attention check, leaving 199 responses on the MHSS–Provided. For those recruited through MHFA, 247 completed the MHSS–Intended but two withdrew consent, leaving 245. Of these, 187 (75.7%) completed the MHSS–Provided. Across the two samples, there were 697 participants who completed the MHSS–Intended and 386 who completed the MHSS–Provided.
Overall, participants were aged 40 years on average, the majority were female, just over half were from the United Kingdom, and two-thirds had a tertiary education (see Table 1). As expected, demographics varied between MHFA Instructors and the general population sample recruited through Prolific. Of the Prolific sample, most (82.7%) had not completed any training or course in how to help someone with a mental health problem, but 37 (8.2%) reported completing an MHFA course, 27 (6.0%) reported professional training (e.g., psychologist, nurse, doctor), and 15 (3.3%) reported some other form of training (e.g., a psychology undergraduate degree).
Participant Characteristics.
Note. MHFA = Mental Health First Aid.
MHSS–Intended Scale
Out of 74 candidate items, seven items did not show a significant association between subsample (general population vs. MHFA) and item score, and were eliminated from further consideration (see Table S1 in Supplementary Material for these items). Examination of the scree plot and eigenvalues did not provide support for the items forming a unidimensional scale (first eigenvalue = 20.88 and second eigenvalue = 6.76). A factor analysis suggested two factors that aligned with items that were recommended (n = 45 items) and not recommended (n = 22 items). Separate graded response models were therefore conducted on the Recommended and Not Recommended items. The output from these is provided in Table S1 in Supplementary Material. As most items showed high levels of discrimination, we focused on items that provided the most information across the middle range of skill level (±2 SDs). The items were reduced to 17 recommended items and six not recommended items, forming MHSS–Intended–Recommended (MHSS-I-R) and MHSS–Intended–Not Recommended (MHSS-I-NR) subscales, respectively. Item parameters are shown in Table 2, the final scale is shown in Supplementary File 1, and factor loadings are shown in Table S2 in Supplementary Material. Each component of the MHFA Action Plan is represented among the 23 items, with additional items measuring first aid actions for a person experiencing suicidality, immediate risk of suicide, psychotic symptoms, and reluctance to seek professional help when needed. We aimed to include items on how to help a person behaving aggressively, but none of the candidate items performed adequately, due to poor fit and/or low levels of information. Each scale showed evidence of unidimensionality. For the MHSS-I-R, all item-total correlations were above .3 (rs = .47–.73), and the average interitem correlation was .33 with all tightly clustered around the mean value (rs = .32–.34), as per guidelines (Clark & Watson, 2019). The MHSS-I-NR showed similar item-total correlations (rs = .61–.83) and the average interitem correlation was .45 (rs = .42–.50). Omega was .93 for the MHSS-I-R and .87 for the MHSS-I-NR. The test information functions indicated both scales have acceptable information (or measurement precision) across a range of mental health support skill up to about 1.5 SDs above average (see Figure 1).

Test Information Functions for the Mental Health Support Scale–Intended.
Parameter Estimates for the MHSS-I and MHSS-P.
Note. MHSS-I= Mental Health Support Scale–Intended; MHSS-P = Mental Health Support Scale–Provided; a = item discrimination (how well an item can differentiate between individuals at different skill levels); b = item difficulty or item thresholds (skill level in which the probability of answering at or above the particular category equals 50%); * = discrimination parameter is identical for the MHSS-P-NR items as a one-parameter model fit the data better than a two-parameter model. (A) = Approach, assess, and assist with any crisis; (L) = Listen and communicate nonjudgmentally; (G) = Give support and information; (EP) = Encourage the person to get appropriate professional help; (EO) = Encourage other supports; (S) = Suicidal behavior; (I) = Immediate risk of suicide; (P) = Psychotic symptoms; (R) = Reluctance to seek help.
For the MHSS-I-R, scores could range from 17 to 85 and were on average M = 71.76 (SD = 8.85). MHSS-I-NR items were reverse-scored and total scores could range from 6 to 30 (with higher scores indicating greater skill) and in this study averaged M = 21.06 (SD = 5.30). Mean scores by sample type are also shown in Table 3. There was a moderate positive correlation between scores on the Recommended and Not Recommended subscales of the MHSS–Intended (see Table 4).
Scores on the MHSS-I and MHSS-P Between MHFA Instructors and Other Participants.
Note. MHSS-I= Mental Health Support Scale–Intended; MHSS-P = Mental Health Support Scale–Provided; MHFA = Mental Health First Aid.
Correlation With 95% Confidence Intervals Between MHSS and Validation Measures (n = 697).
Note. MHFA = Mental Health First Aid; MHSS-I-R = Mental Health Support Scale–Intended–Recommended; MHSS-I-NR = Mental Health Support Scale–Intended–Not Recommended; MHSS-P-R = Mental Health Support Scale–Provided–Recommended; MHSS-P-NR = Mental Health Support Scale–Provided–Not Recommended.
n = 386.
MHSS–Provided Scale
For the MHSS–Provided, 386 participants completed all 42 items on actions that could apply to all first aid encounters. These encounters were mainly with family (40.9%) or friends (34.7%), but intimate partners (15.0%), work colleagues (6.2%), and other (3.1%; for example, student, friend’s daughter) were also reported.
Additional subscales were only completed by participants when the actions were relevant to the circumstances of the encounter, for example, when the person was showing signs of suicidality. The number of participants completing these subscales was as follows: suicide (n = 145), immediate risk of suicide (n = 34), psychotic symptoms (n = 56), reluctance to seek help (n = 216), and aggression (n = 43). As the sample size for these subscales—except for suicide and reluctance to seek help—was below minimum recommendations for IRT models (DeMars, 2010; Morizot et al., 2007), these are not presented. Findings for the suicide and reluctance to seek help items are shown in Table S3 in Supplementary Material.
There were 10 items that did not show a significant association between participant source and item endorsement, and these were not considered further (see Table S3 in Supplementary Material for these items). A factor analysis on the remaining 32 items showed two factors corresponding to 23 Recommended and nine Not Recommended actions. Two-parameter logistic models were fit to each of these subscales and the output from these is shown in Table S3 in Supplementary Material. We selected items with at least moderate levels of discrimination, that provided good levels of information across a range of mental health support skill levels, that did not show major problems in item fit, and that represented each component of the MHFA Action Plan. Nine recommended items (MHSS–Provided–Recommended [MHSS-P-R]) and three not recommended items (MHSS–Provided–Not Recommended [MHSS-P-NR]) were selected. Item parameters are shown in Table 2, the final scale is shown in Supplementary File 2, and factor loadings are shown in Table S4 in Supplementary Material. These items match the corresponding actions in the MHSS–Intended. Each scale showed evidence of unidimensionality. For the MHSS-P-R, all item-total correlations were above .3 (rs = .50–.70), and the average interitem correlation was .31 with all tightly clustered around the mean value (rs = .29–.32). The MHSS-P-NR was similar, with item-total correlations ranging from rs = .66–.74 and an average interitem correlation of .25 (rs = .18–.35). McDonald’s omega was .93 for the MHSS-P-R and .78 for the MHSS-P-NR. The test information functions show information peaks at below average levels of skill and less measurement precision at above average levels of mental health support skill (see Figure 2).

Test Information Functions for the Mental Health Support Scale–Provided.
For the MHSS-P-R, scores could range from 0 to 9 and were on average M = 7.09 (SD = 2.11). MHSS-P-NR scores could range from 0 to 3 (with higher scores indicating greater skills) and in this study averaged M = 2.43 (SD = 0.74). Mean scores by sample type are also shown in Table 3. As shown in Table 4, there was a small positive correlation between the Recommended and Not Recommended scales of the MHSS-P. Of note, there were large positive correlations between the Intended and Provided measures on corresponding subscales.
Validation
There were moderate to large correlations between MHSS and the MHFA knowledge scale, with larger correlations for the Not Recommended scales compared with Recommended scales (see Table 4). As expected, there were small to medium correlations with mental health literacy and social distance. Correlations with the Empathy measure varied in size from small to large, depending on the particular scale, and our hypothesis that there would be attenuated correlations compared with other validation measures was only partially supported. We also examined the correlations in the Prolific sample alone to assess associations in a general population sample. As might be expected given the exclusion of the expert instructors, these correlations were attenuated and are shown in Supplementary Table 5. The MHSS also discriminated between MHFA Instructors and members of the general public, with large to very large differences in mean scores, particularly for the MHSS–Intended (see Table 3).
For the analysis of test–retest reliability, we received 80 responses with test–retest data on the MHSS–Intended and 70 also completed the MHSS–Provided. Test–retest correlations were acceptable for the MHSS-I-R (ICC = 0.83), MHSS-I-NR (ICC = 0.79), and MHSS-P-R (ICC = 0.77) but were somewhat lower for the three-item MHSS-P-NR (ICC = 0.59).
Discussion
This study aimed to develop and validate a new self-report scale of MFHA behaviors named the Mental Health Support Scale. The MHSS–Intended includes 23 items and can be used to measure intended actions to support a person developing a mental health problem or crisis, while the MHSS–Provided includes 12 items and measures actual help provided. Both versions include matching actions that can apply to a range of mental health problems and MHFA encounters. Study results provided evidence for validity, as MHSS scores discriminated between respondents with and without expertise in MHFA and correlated with higher MHFA knowledge and mental health literacy, and lower stigmatizing attitudes. The scales demonstrated high reliability when measured by omega and test–retest correlation over 2 weeks, with IRT models indicating measurement precision across a range of mental health support skill. Although the three-item Not Recommended subscale of the MHSS-P showed lower reliability than the other scales, this is not unexpected given its brevity.
The MFHA guidelines include recommendations on things to do as well as what to avoid doing. Although previous evaluations of MFHA actions or intentions typically derived an overall quality score (e.g., Hung et al., 2021; Morgan et al., 2019), our analysis indicates that separate subscales of recommended and not recommended actions are appropriate as they measure different constructs and are not highly correlated. MHFA training may have a differential impact on each, underscoring the need to evaluate changes in recommended and not recommended actions separately to understand how the extent and quality of support change.
The items in the MHSS were derived from actions that experts overwhelmingly agreed should be included in MFHA guidelines. While some actions could be considered essential to providing aid and should therefore be assessed (e.g., “approach them about your concerns about their mental health,” “listen to their problems without expressing any judgment”), they did not provide useful information as scale items and were not included in the final scale. This is because they were too “easy” and were endorsed by almost all respondents. Actions that were more “difficult” and provided greatest discrimination at higher skill levels tended to be more specialized actions that went beyond empathic responding, such as asking about thoughts of harm to self or others, discussing privacy and confidentiality wishes, and specific actions related to suicide. The higher difficulty of the suicide items is consistent with findings from surveys of the public assessing suicide literacy, which show that there are persistent myths around the danger of asking about suicide (Nicholas et al., 2020). These actions are more specialized and it is therefore less likely that untrained people will respond correctly, unlike the actions that a caring person might be expected to do regardless of the training.
The MHSS–Intended and MHSS–Provided will be useful in different circumstances. While the ultimate aim of an evaluation of MHFA training is to show actual rather than intended changes in support, this is rarely practical to assess in trials and course evaluations. MHFA trainees need to have the opportunity to help someone, and this may not occur during the follow-up period of an evaluation. For example, data from this study and others (Reavley & Jorm, 2015) suggest that about 50% of the general population report knowing someone with a mental health problem in the previous 12 months. A measure of intended support is therefore useful to assess change in MHFA skills, particularly when help provided cannot be assessed. The MHSS–Intended will also be useful to measure skills in particular populations, when data are required from an entire sample rather than a subset who have provided help. This study showed strong correlations between intentions and behavior measured cross-sectionally (rs > .5), supporting the use of an intentions scale as a proxy for changes in skills. This is consistent with meta-analyses that show a large correlation (r = .53) between intentions and prospective behavior (Sheeran, 2002). It is also similar in magnitude to the correlation between quality of intended support and actual support given for suicide when measured cross-sectionally (Jorm et al., 2018). As the MHSS share items, there is an opportunity to examine how well intentions predict future behavior at an individual item level. Differences between intentions and actions taken may occur because the person helped may be different to the target person of the MHSS–Intended, and the relationship and particular circumstances may differ (Rossetto et al., 2018). In addition, theories of prosocial behavior may inform how different barriers function to prevent helping along the pathway from first noticing a problem, assuming responsibility to help, deciding how to help, and having confidence in the capacity to help (Darley & Latane, 1968). In particular, barriers to providing MFHA may include the person being perceived as difficult to approach or having a severe mental illness, or having a less close relationship with them (Morgan & Rossetto, 2022), which may impact confidence in helping or taking responsibility to help. However, examining the link between intended action and actual action may increase our understanding of barriers that prevent acting even when helpers know what they should do, that could be targeted further in training.
To our knowledge, this is the first study to systematically develop a self-report measure of MFHA behaviors. Our study has several strengths. Scale items were drawn directly from statements used to develop MFHA guidelines, supporting the content validity of the measure. The study included a general population sample with a range of skill level as well as a large sample of participants with expertise in MHFA. The use of IRT models allowed us to choose items that performed well across a range of MFHA skill level and that could discriminate between respondents with and without expertise. Nevertheless, some limitations should be noted. The MHSS does not include items that could apply to specific helping scenarios, including helping a person with substance misuse, a panic attack, or an eating disorder. MFHA guidelines for these problems were planned to be revised during the development of the MHSS, and therefore a set of updated actions was not available to test. Despite this, the MHSS items that represent the ALGEE Action Plan are still relevant to these problems, as they apply to all MHFA encounters. Second, we did not collect sufficient data to identify items that performed well on the optional subscales of the MHSS–Provided (e.g., suicide, reluctance to seek help, and psychotic symptoms). These subscales addressed situations that only a minority of respondents encountered, and further data are needed to assess mental health support skills for these situations. Items measuring responses to aggressive behavior did not perform well and may require new candidate items that provide more information across a range of skill level. Finally, our study included more females than males in both the general population sample and MHFA Instructor sample. However, this imbalance is consistent with the population of MHFA Instructors and participants in studies evaluating MHFA training (Morgan et al., 2018). If the MHSS were to be used in a population-based sample to measure helping skills, then an investigation of gender-based differential item functioning with a larger sample of males would be prudent.
Future research on the MHSS should focus on addressing study limitations and extending the application of the instrument. Testing the scale in another large sample of the general population would give the opportunity to confirm the factor structure and investigate whether the MHSS–Intended scale could be shortened. In particular, more data are needed to investigate item performance in the subscales of the MHSS–Provided (e.g., suicide, reluctance to seek help, and psychotic symptoms). This would inform whether items addressing these problems could be added as optional items where relevant. Future research could consider validating the MHSS with samples of people with different characteristics, for example, race/ethnicity. A recipient version of the measure could also be developed, to be completed by individuals who have been supported by someone when they were developing a mental health problem or crisis. The items in the MHSS were based on observable actions so that the scale could be completed by the person receiving help and not just the person providing it. The perspective of the person helped is an important component in understanding helping encounters and the quality of the support provided. The helper may believe they have provided appropriate first aid actions, such as communicating clearly and simply, but the recipient may perceive things differently. Finally, further research could use the scale to evaluate the effectiveness of MHFA training on providing support. A validated scale that is short and easy to administer and score would simplify data collection on help provision. This could enable more frequent assessment of help provision in the weeks and months following training, which could increase understanding of the impact of MHFA training courses.
In conclusion, our findings support the MHSS as a valid and useful measure of MFHA behaviors. It may prove useful in addressing the research gap on whether MHFA training leads to better support provided to recipients of aid. In addition, the scales could be used by other early intervention programs (e.g., Applied Suicide Intervention Skills Training, Rodgers, 2010) to evaluate the effectiveness of these interventions in promoting successful behavioral change in supporting people with mental health problems.
Supplemental Material
sj-docx-1-asm-10.1177_10731911221106767 – Supplemental material for Development of the Mental Health Support Scale: A New Measure of Mental Health First Aid Behaviors
Supplemental material, sj-docx-1-asm-10.1177_10731911221106767 for Development of the Mental Health Support Scale: A New Measure of Mental Health First Aid Behaviors by Amy J. Morgan, Judith Wright, Andrew J. Mackinnon, Nicola J. Reavley, Alyssia Rossetto and Anthony F. Jorm in Assessment
Supplemental Material
sj-docx-2-asm-10.1177_10731911221106767 – Supplemental material for Development of the Mental Health Support Scale: A New Measure of Mental Health First Aid Behaviors
Supplemental material, sj-docx-2-asm-10.1177_10731911221106767 for Development of the Mental Health Support Scale: A New Measure of Mental Health First Aid Behaviors by Amy J. Morgan, Judith Wright, Andrew J. Mackinnon, Nicola J. Reavley, Alyssia Rossetto and Anthony F. Jorm in Assessment
Supplemental Material
sj-docx-3-asm-10.1177_10731911221106767 – Supplemental material for Development of the Mental Health Support Scale: A New Measure of Mental Health First Aid Behaviors
Supplemental material, sj-docx-3-asm-10.1177_10731911221106767 for Development of the Mental Health Support Scale: A New Measure of Mental Health First Aid Behaviors by Amy J. Morgan, Judith Wright, Andrew J. Mackinnon, Nicola J. Reavley, Alyssia Rossetto and Anthony F. Jorm in Assessment
Supplemental Material
sj-docx-4-asm-10.1177_10731911221106767 – Supplemental material for Development of the Mental Health Support Scale: A New Measure of Mental Health First Aid Behaviors
Supplemental material, sj-docx-4-asm-10.1177_10731911221106767 for Development of the Mental Health Support Scale: A New Measure of Mental Health First Aid Behaviors by Amy J. Morgan, Judith Wright, Andrew J. Mackinnon, Nicola J. Reavley, Alyssia Rossetto and Anthony F. Jorm in Assessment
Supplemental Material
sj-docx-5-asm-10.1177_10731911221106767 – Supplemental material for Development of the Mental Health Support Scale: A New Measure of Mental Health First Aid Behaviors
Supplemental material, sj-docx-5-asm-10.1177_10731911221106767 for Development of the Mental Health Support Scale: A New Measure of Mental Health First Aid Behaviors by Amy J. Morgan, Judith Wright, Andrew J. Mackinnon, Nicola J. Reavley, Alyssia Rossetto and Anthony F. Jorm in Assessment
Supplemental Material
sj-docx-6-asm-10.1177_10731911221106767 – Supplemental material for Development of the Mental Health Support Scale: A New Measure of Mental Health First Aid Behaviors
Supplemental material, sj-docx-6-asm-10.1177_10731911221106767 for Development of the Mental Health Support Scale: A New Measure of Mental Health First Aid Behaviors by Amy J. Morgan, Judith Wright, Andrew J. Mackinnon, Nicola J. Reavley, Alyssia Rossetto and Anthony F. Jorm in Assessment
Supplemental Material
sj-docx-7-asm-10.1177_10731911221106767 – Supplemental material for Development of the Mental Health Support Scale: A New Measure of Mental Health First Aid Behaviors
Supplemental material, sj-docx-7-asm-10.1177_10731911221106767 for Development of the Mental Health Support Scale: A New Measure of Mental Health First Aid Behaviors by Amy J. Morgan, Judith Wright, Andrew J. Mackinnon, Nicola J. Reavley, Alyssia Rossetto and Anthony F. Jorm in Assessment
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded through a CR Roper Fellowship held by AMo.
Supplemental Material
Supplemental material for this article is available online.
