Abstract
BACKGROUND:
Paramedics working in helicopter teams undertake water and land rescues. Historical assessments of role-related fitness were not developed using physical employment standards methodology.
OBJECTIVE:
To compare the historical selection tests with new tests developed via contemporary scientific methodology.
METHODS:
Candidates undergoing selection to the role of flight paramedic (n = 14; age 37±5 yrs, body mass index [BMI] 26±4 kg.m2) undertook existing paramedic selection tests on land and in water, measurements of task duration, maximum heart rate (HRmax), rate of perceived exertion (RPE6 - 20) and capillary blood lactate (Lacmax) were recorded. These results were compared to the same variables in experienced paramedics (n = 14; age 44±5 yrs, BMI 25±3 kg.m2) who undertook the new tests.
RESULTS:
Land task duration (existing 17±2 min vs. proposed 7±2 min, p < 0.05) HRmax (existing 186±13 b.min–1 vs. proposed 173±11 b.min–1, p < 0.05), and Lacmax (existing 23±3 mmol.L–1 vs. proposed 8±2 mmol.L–1, p < 0.05) were higher in the existing test compared to the proposed tests. Water task duration (existing 12±2 min vs. proposed 10±1 min, p < 0.05) was longer in the existing test, but HRmax (existing 166±18 b.min–1 vs. proposed 167±15 b.min–1, p = 0.90), Lacmax (existing 11±4 mmol.L–1 vs. proposed 11±4 mmol.L–1, p = 0.90) did not differ. RPE6 - 20 did not differ between groups for water or land.
CONCLUSIONS:
The historical land-based physical tests for paramedics differed from the proposed tests, however the water-based tests had similar duration and physiological demands. Use of tests not developed via established scientific methodologies risks eliminating candidates suitable to work in the role, or including candidates that are not.
Introduction
Rescue paramedics and other skilled professionals working in specialised teams like helicopter emergency medical services (HEMS) regularly engage in physically demanding work, which can involve search and rescue tasks in remote wilderness or open water [1, 2]. Organisations employing these personnel must establish minimum physical standards during recruitment and selection to assess the suitability of individuals for specialist emergency response teams [3]. To ensure accurate, fair, and equitable selection processes, it is necessary to develop formal physical employment standards (PES) through a scientific, systematic, and evidence-based approach [4]. Establishing PES via this methodology prevents discrimination against capable individuals while identifying those who may be unsuitable for the role, thus increasing employer accountability and equality in selection of staff. Formal PES may also reduce the risk of injury during work tasks, and in turn enhance the capability and reliability of service delivery [5]. Finally, whilst formal interrogation of failed selection has historically been uncommon, evidence-based PES can be useful to serve as an obstruction to legal challenges from unsuccessful candidates [6].
Several procedures for developing scientifically valid PES have been published [7–9], with the benchmark method described by Tipton et al. [4], outlining a six-stage methodology to establish defensible physical standards for employment. Developing PES involves determining critical occupational tasks, understanding the interactions between personnel, the environment, equipment, and the physical demands of the role [4], and should analyse common daily tasks, and intermittent yet essential tasks, such as specialised emergency procedures performed by rescue and first response teams. Historically, fitness tests used for selection in physically demanding occupations have been created with poor reliability and validity when compared to the actual work environment. This may be considered arbitrary, not fit for purpose, and potentially discriminatory [7, 10]. Greater effort is needed to assess existing, potentially inaccurate physical tests for their relatively to job requirements, and examine new tests developed through a formal PES process.
The most accurate assessment of cardiovascular work is achieved through quantifying oxygen consumption (VO2) during exercise [11]. Portable indirect calorimetry devices have proven accurate for assessing oxygen consumption during occupational activities, such as the Metamax ambulatory gas analysis system and the Cosmed AquaTrainer metabolic measurement system [12, 13]. However, these devices may not be easily accessible primarily due to cost, and such limitations and others may hinder organisations from developing scientifically valid PES. Surrogate measures of physiological demands have been used extreme environments including water rescue and wilderness activities, using metrics including blood lactate, heart rate, and rate of perceived exertion (RPE) [14, 15]. In the absence of the technology and/or funding for assessment of VO2, these supplementary indices may be sufficient to provide gross information regarding physiological demands in a PES. These same measures have been used to estimate physiological demand in both sports [16, 17] and occupational settings [18, 19].
The physiological demands of HEMS paramedics performing physically demanding rescues on land and water has been established in previous work. Water based rescues require paramedics to work at 81% of VO2peak for 10.2±1.0 min, and in land tasks, 86% of VO2peak for 7.0±3.6 min [20]. For HEMS paramedics, PES have historically been established through non-scientific means, with gross estimations of task requirements and resultant creation of generic physical tests to assess performance [21]. During work to evaluate PES in a single Australian HEMS, candidates undergoing selection to the role undertook the existing historical tests, whilst new, scientifically developed tests were concurrently being established. The aim of this study was to compare the physiological demands of historical physical tests (land and water based) undertaken by new candidates, compared to the new proposed tests, undertaken by experienced HEMS paramedics. We expected the historical tests to induce excessive physiological demands when compared to new tests aligned to job-specific physiological demands.
Methods
Participants and experimental design
Two groups of paramedics completed two separate land and water-based task physical tests to ascertain physiological demands during task performance. Group 1 were candidates undergoing the existing selection process, including physical testing, for employment as an Intensive Care Flight Paramedic (ICFP) with Ambulance Victoria (AV). Group 2 were qualified ICFP at AV. AV is the single provider of HEMS to the state of Victoria (Australia), an area of 237,629 km2 including remote and difficult-to-access terrain and greater than 2,000 km of coastline. AV helicopters are staffed by an ICFP who are deployed from the aircraft to perform land and water-based winch (also known as “hoist”) rescue, an air crew officer (who operates the winch among other duties), and a pilot. These operations have been described in detail previously [1, 21].
Eligibility for the study required Group 1 participants to be experienced paramedics employed by AV and undergoing selection to the ICFP role. For Group 2, participants were paramedics with at least two years’ experience as a qualified ICFP with AV, with current certification to undertake all duties including winch rescue. These limitations of using independent groups were acknowledged and discussed with the supporting organisation, who logistically could not provide the same participants due to complications incurred from the COVID19 pandemic. The supporting organisation only gave permission for recruit paramedics (Group 1) to undertake the existing tests and were not able to provide access to baseline fitness results, therefore the two groups could not be matched.
Participants were recruited via convenience sampling using corporate email. All participants completed a form that described demographics and experience in paramedic roles. Written informed consent was obtained from all participants included in the study. The study was approved by the Monash University Human Ethics Research Committee (Project number 19051) and the Ambulance Victoria Research Committee. Prior to reporting to the testing locations, participants were required to avoid strenuous exercise in the preceding 24 hours, be free from illness for the preceding 14 days. To identify risks of completing a strenuous exercise test, all participants were required to complete the Exercise and Sports Science Australia Pre-Exercise Screening Tool [22].
Measurements
Participants in both groups had assessment of height, body mass (and calculated body mass index [BMI]) prior to each test. Body mass was measured using calibrated SECA 813 Digital Flat Scales (SECA, Hamburg, Germany). Height was measured using a SECA 213 Portable Stadiometer (SECA, Hamburg, Germany). After sitting for 3 min, capillary blood (5μl) was sampled from a fingertip for analysis of resting blood lactate concentration (Lactate Pro2, Arkray, Tokyo, Japan). Heart was continuously sampled via chest strap and smart watch (Garmin Forerunner 735XT watch and HRM SWIM strap, Garmin, Olathe, Kansas, USA).
Physical performance test protocols: Land
Group 1 (existing test):
The test for Group 1 was the established selection test for AV ICFP candidates. Participants were briefed on the study protocol and familiarised with the Rating of Perceived Exertion (RPE6 - 20) scale [23]. Participants walked 850 m up a gravel track whilst carrying land operational rescue equipment weighing 43.4 kg. The equipment weights are detailed in previous work [21]. The track gained 104 m in height with an average gradient of 12.2%. Participants were then required to walk 850 m back down the track to the start point, within a time limit of 20 min as determined by the supporting organisation (AV). Participants wore the standardised AV operational uniform, a one-piece cotton flight suit (Australian Defence Apparel, Thomastown, Victoria, Australia), with the fabric weight 200 g·m2, and hiking boots (total weight 3.9 kg). To prevent interference in candidates’ performance during a real-world selection test, researchers were only able to sample RPE6 - 20 and maximum blood lactate (Lacmax) at the maximum elevation (i.e., top of the 850 m walk/turnaround point).
Group 2 (proposed test):
The test for Group 2 was informed by previous work [20]. Participants were briefed on the study protocol and familiarised with the RPE6 - 20 scale [23]. Participants walked 250 m along a steep track whilst carrying land operational rescue equipment weighing 43.4 kg, as detailed in previous work [21]. The track gained 45 m in height with an average gradient of 18.3%. Participants wore the standardised AV operational land uniform, a two-piece fire-retardant Nomex® flight suit (Sisley Clothing, Maryland, New South Wales, Australia), with the jacket fabric weight 200 g·m2 and pants fabric weight 200 g·m2 and a cotton undershirt, and sturdy hiking boots (total weight 3.9 kg). There was no time limit. For the purposes comparison to Group 1, we report the equivalent indices from RPE6 - 20, HRmax and Lacmax, sampled at end of test (i.e., maximum elevation/top of the 250 m walk).
Physical performance test protocols: Water
Group 1 (existing test):
The test for Group 1 was the established selection test for AV ICFP candidates. Participants were briefed on the study protocol and re-familiarised with the RPE6 - 20 scale [23]. Participants entered a 50 m pool wearing the standardised AV operational uniform, a one-piece cotton flight suit (Australian Defence Apparel, Thomastown, Victoria, Australia), with the fabric weight 200 g·m2, and hiking boots (total weight 3.9 kg). Phase 1 required participants to stand 15 m from the end of pool and swim underwater, without surfacing, to the pool end. Phase 2 required participants to tread water for 2 min and kick off their hiking boots. Phase 3 required participants to swim 400 m using any stroke, whilst remaining in the flight suit. The time limit until end of Phase 3 was 17 min and 30 seconds, as determined by AV. Phase 4 required participants to tread water for 10 min. To prevent interference in candidates’ performance during a real-world selection test, researchers were only able to sample RPE6 - 20, maximum heart from the smart watch (HRmax) and maximum blood lactate (Lacmax) at end of Phase 2 and Phase 3.
Group 2 (proposed test):
The test and data for Group 2 was informed by pervious work [20]. Participants were briefed on the study protocol and re-familiarised with the RPE6 - 20 scale [23]. The test required participants to use any stroke to swim 50 m in open water whilst wearing AV operational rescue equipment, then use the AV operational winch rescue sling (known as a strop, Safety Equipment Technical Services, Monbulk, Victoria, Australia) to rescue a 40 kg manikin from a life raft, and then swim-tow the manikin back 25 m from the raft (a total swim distance of 75 m per bout). The task was then to be repeated once. Participants wore the operational water rescue uniform, including an immersion suit (OWFS, Ursuit, Turku, Finland), a full-body winch harness, winch hook connector, rescue strop for patient recovery, a lifejacket containing signalling equipment, snorkel, mask, fins and a water rescue helmet. To simulate the real-world effect of towing a helicopter winch cable, a 75 m time-expired section of winch cable and a winch hook was connected to the participant. The total weight of all equipment was 16.1 kg. There was no time limit. The individual weights of this equipment are detailed in previous work [21]. For the purposes comparison to Group 1, we report the equivalent indices from i.e., RPE6 - 20, HRmax and Lacmax, sampled at end of test (i.e., end of 2 x 75 metre bouts).
Statistical analysis
Data for lactate and HR were assessed for normality via the Shapiro-Wilk test and found to be normally distributed. Data are reported as mean±SD. Descriptive statistics and independent samples t-tests were used for Group 1 and Group 2 demographics, HR, lactate, and anthropometric data. To examine effect size for HR, lactate and task duration, we calculated Cohen’s d with thresholds of small (0.2), medium (0.5) and large (0.8). RPE data were compared via the Mann-Whitney U test and are reported as median (IQR). Elapsed time difference was used to compare differences in test duration and maximal acceptable work duration (MAWD,=95.336 x e- 7.28x %VO2peak, as determined in previous work, [20]). All analyses were conducted using Prism (Version 8.4.3; GraphPad Software, San Diego, California, USA), and statistical significance was determined a priori at p < 0.05.
Results
Group 1 included 14 ICFP candidates (Age 37±5 yrs, BMI 26±4 kg.m2, 2 female) who completed the existing selection tests. Group 2 included 14 qualified and experienced ICFP (Age 44±5 yrs, BMI 25±3 kg.m2, 2 female) who completed the newly developed land and water assessments (Table 1).
Participant demographics for Group 1 (Recruits, n = 14) compared to Group 2 (Experienced ICFP, n = 14). Data are presented as mean (±SD)
Participant demographics for Group 1 (Recruits, n = 14) compared to Group 2 (Experienced ICFP, n = 14). Data are presented as mean (±SD)
For the land tests, HRmax (186±13 b.min–1 vs. 173±11 b.min–1, p < 0.05, d = 1.08), and Lacmax (23±3 mmol.L–1 vs. 8±2 mmol.L–1, p < 0.05, d = 5.88) were significantly higher for the existing versus new tests (Fig. 1). For the historical and new water rescue tests, HRmax (166±18 b.min–1 vs. 167±15 b.min–1, p = 0.90, d = 0.06), and Lacmax (11±4 mmol.L–1 vs. 11±4 mmol.L–1, p = 0.90, d = 0.00) did not differ. RPE6 - 20 did not differ between groups for either water or land (Fig. 2).

Comparison of a) HRmax b) Lacmax c) RPE6 - 20 for Group 1 (historical test) versus Group 2, the scientifically developed new test for land winch rescue assessments. *Lactate monitor maximum value = 25 mmol.L–1.

Comparison of a) HRmax b) Lacmax c) RPE6 - 20 for Group 1 (historical test) versus Group 2, the scientifically developed new test for water winch rescue assessments.
The historical land test task duration was significantly longer than the new test (17±2 min vs. 7±2 min, p < 0.05, d = 5.00), and was nearly 4 min longer than MAWD (13 min) as determined in previous work [20]. The historical water test task duration was similar to the new test (12±2 min vs. 10±1 min, p < 0.05, d = 1.26), and was just over half the MAWD (21 min) for this task (Fig. 3).

Comparison of historical task duration versus new test duration and maximal acceptable work duration for a) land, b) water (*MAWD, as determined in: Meadley B, Horton E, Perraton L, Smith K, Bowles KA, Caldwell J. The physiological demands of helicopter winch rescue in water and over land. Ergonomics. 2022 Jun 3;65(6):828-41.).
To our knowledge, this is the first study to compare historical physical selection tests with scientifically developed PES in helicopter rescue paramedics. We found the historical land-based physical tests invoked extreme physical demand, specifically noting mean maximal blood lactate values nearly three times that of the new, task specific tests. Further, the task duration of the historical tests exceeded MAWD. Conversely, the historical water-based test had a similar duration and generated similar physiological strain, but was different to the new test in terms of the technical competency required. These differences reflected job specific technical requirements, which did not influence the physiological demands. Specifically, for the historical land-based test, there would be significant chance of not employing candidates with the required physical capacity and therefore would in fact be capable of undertaking the job.
The significance of establishing scientifically robust and legally defensible physical employment standards for diverse occupational roles has a well-documented historical context [10, 25]. Over the last decade, there has been a prevailing shift towards integrating these scientifically validated assessments as the standard practice, with an emphasis on enhancing injury prevention while accommodating individuals with illnesses or disabilities [26]. Nonetheless, certain occupations pose unique challenges, making it impractical to create tailored tests for each one [25]. Specifically, the number of HEMS employees is relatively small which means collecting job specific data is limited. Consequently, it becomes crucial to ascertain whether existing tests can continue to offer meaningful insights to stakeholders regarding an individual’s physical capacity and their capability to undertake physically demanding responsibilities [27]. This extends to the wider ambulance service as the implementation of new validated assessments requires large funding and participation from an already stretched workforce. In this case, there is a pressing need to evaluate whether the existing battery of tests can accurately gauge an individual’s fitness for the job. The implications are extensive, affecting not only the safety and efficiency of operations but also the legal and ethical aspects of workplace practices and it may lead to a reduced pool of candidates. Therefore, the selection process may inadvertently prioritize the most physically fit individuals over those who are best suited for the job based on their skills, knowledge, and aptitude. the elevated physiological demands associated with the historical land test could potentially deter individuals who are genuinely fit for the role from pursuing employment opportunities.
There are currently no direct comparisons to understand the disparity between performance requirements of an existing test to a test developed based on robust Physical Employment standards. two tests type for paramedics [28]. The Ottawa Paramedic Physical Ability Test was only developed from existing non-validated tests, but not compared to a new test. Other studies showing direct test comparisons have primarily been in military populations. These have either included direct comparisons of the PES to generic predictive tests [29] or different variants of the similar tests or tasks [30, 31]. The current study is the first to make direct comparisons between existing task-specific assessments to the scientific defensible assessment based on the six-stage framework [4].
The outcomes of our study reveal a substantial disparity in the physical demands imposed by the traditional land-based test in comparison to the newly introduced land-based evaluation. Notably, the historical land test demonstrates significantly elevated heart rates, higher blood lactate levels, and a more protracted duration, signifying a heightened level of physiological strain. This observation has critical implications, particularly in the context of recruitment and workforce composition. Cut-scores that are set too high tend to screen out more females than males highlighting adverse impact [9]. One noteworthy consequence of these pronounced differences in physical demands is the potential exclusion of applicants who possess the requisite physical capabilities for the job [7].
The unusually high blood lactate values seen in the historical land-based assessment (Lacmax 23±3 mmol.L–1), are likely to have occurred due to a complex combination of factors. These may include the > 40 kg mass involved load carriage [32, 33], impaired ventilation secondary to thoracic load bearing [33, 34]. Whilst data exist demonstrating high (>14 mmol.L–1) maximal lactate values during maximal exercise in elite athletes [35, 36] and trained subjects [37], during load carriage in young recreational athletes [32], and during submaximal and maximal exercise in elite soldiers [38], there are no published data that are demographically comparable to the groups in this study (i.e., non-elite, mean age 37 years). Further work is required to investigate what effects load carriage, demographic factors, and pacing (arising from consequences of failure in selection) may have on biomarkers of metabolic demand for this and similar occupations, accounting for demographic factors such as activity levels and age.
The concept of MAWD has been examined in existing literature [39, 40]. Our study showed disparity of MAWD between the two tests, with the historical land-based test indicating a substantially higher MAWD in comparison to the new land-based test. This disparity carries significant implications, predominately that the workload demanded by the historical land-based test far exceeds what is necessary for the job specific tasks. The elevated work rate required by the historical test provides additional evidence for its unsuitability for accurately assessing the physical aptitude of individuals seeking to become Intensive Care Flight Paramedics. This disparity between the test’s demands and the job’s actual requirements highlights the need for a more consistent and context-specific assessment tools for evaluating the fitness of aspiring Intensive Care Flight Paramedics for land-based tasks.
While the historical land-based test exhibited more pronounced differences compared to the new water-based test, the swimming tasks did not significantly differ in terms of their physiological demands. This similarity in workload implies that there is a potential avenue for considering the incorporation of the historical test into the new test battery. The primary distinction between these two tests lies in the environmental setting in which they are conducted. The new test takes place in open water, whereas the historical swim test occurred in a controlled environment of a 50 m pool. This distinction is crucial, as it impacts the logistical aspects of test administration and maintenance of test consistency. Conducting the swim test in a pool offers logistical advantages for the administering organization which is an important consideration when developing PES [25]. In addition, the controlled pool environment ensures a consistent testing scenario, eliminating the variables introduced by outdoor conditions such as surf conditions [41, 42] and water temperature [43]. The ability to maintain test components’ consistency in a pool setting holds substantial merit for designing a scientifically defensible physical test.
Our findings demonstrate the need for a careful re-evaluation of the physical assessment standards employed in the recruitment process to ensure that they align with the actual demands of the job. This consideration is not only pivotal for promoting inclusivity and diversity but also for optimizing the selection of candidates who possess the ideal blend of physical capability and job-related competencies.
Strengths and limitations
There are several limitations to this study. The main limitation of this study is that it is not a within-participant repeated measure design. Ideally, the same group of participants would have undertaken all test protocols. However, this was not possible at the time. Firstly, the supporting organisation only gave permission for recruit paramedics (Group 1) to undertake the existing tests and not the proposed tests. Secondly, whilst it was planned for Group 2 to undertake the historical tests, this was prohibited by COVID19 pandemic restrictions coming in to force. By the time these restrictions were removed, too much time had elapsed (>1 year), all protocols would have had to have been repeated, and the study was not funded to do this. Further, although the recruits did undertake baseline physical fitness tests prior to these task-based assessments, the supporting organisation did not provide access to these results. Thus, differences in underlying fitness of participants in each group was not able to be compared. Lastly, whilst we would ideally present environmental data for testing days, each of the tests were undertaken on different days, at different times of day, determined by the supporting organisation schedule. To report each individual result in the context of discrete environmental changes would be prohibitive, such can be the nature of research in the outdoors. Nonetheless, the authors acknowledge that environmental factors could affect physiological responses, but in good faith indicate that no extremes of temperature or climate were noted during testing days. Results from this study should be interpreted in the context of these limitations.
Sampling of expired air at multiple phases during each test, concurrent with other indices, would have allowed for more accurate and thorough comparison and reporting. This was however limited by funding, the COVID19 pandemic and sampling expired air, and potential interference in a ‘live’ selection process, where researcher interference may affect test outcome. Future comparisons of historical and new PES tests should incorporate expired air analyses and exclude or discount any researcher impact on selection.
A major strength of this study is the demonstration of the potential impact of arbitrary physical tests when assessing suitability for a complex, physically demanding role. The results in the land tests for example should serve to highlight how arbitrary tests are at risk of inequity, exclusion and unnecessary and excessive strain that does not match job requirements.
Conclusion
The historical land-based physical tests for helicopter rescue paramedics differed significantly from the proposed scientifically developed tests, however the water-based test had a similar duration and generated similar physiological strain. Assessing physical capability utilising tests that are developed without the established scientific methodologies may increase the risk of eliminating candidates with the required physical capacity to work in the role, and conversely, including candidates that are not suitable.
Declarations
Ethical approval
The study was approved by the Monash University Human Ethics Research Committee (Project number 19051).
Informed consent
All participants in this study provided informed consent prior to participation.
Conflict of interest
The authors have no conflicts of interest to declare.
Footnotes
Acknowledgments
The authors would like to acknowledge and thank the intensive care flight paramedics who participated in this study. The authors would like to thank Ashleigh DeLorenzo and Rembrandt Bye who provided technical assistance during data collection.
Funding
Consumable items used in data collection were in-part funded by a small research grant from the Australasian College of Paramedicine.
