Abstract
Naturally occurring thoracolumbar spinal cord injury (SCI) is common in dogs, and multi-center veterinary clinical studies can serve as translational tools to identify potentially effective therapies for human clinical trials. Assessment of gait is a key outcome, and several scales are used in dogs. The purpose of this study was to determine whether an international group of researchers could score gait reliably, to compare and contrast the performance of gait scales and to describe appropriate data analysis techniques. A training module was developed for a binary scale, modified Frankel Scale (MFS), Texas SCI Scale (TSCIS), and Open Field Scale (OFS). Raters viewed the training module, scored five training video clips to achieve proficiency, then scored 30 video clips from 10 dogs recovering from SCI. Interrater reliability was calculated, and correlation between scales was examined. Ceiling effect was described. Twenty raters with differing experience participated. The training module took 16 min to view. Raters chose identical binary outcomes in 597 of 600 observations. Intraclass correlation for MFS, TSCIS, and OFS was excellent at 0.85, 0.96, and 0.96, respectively, regardless of rater expertise. Ceiling effect occurred in all dogs that recovered ambulation, particularly using MFS and binary outcome. The TSCIS and OFS captured recovery of ambulatory dogs better, and addition of scores on hopping and proprioception mitigated ceiling effect. We conclude that gait in dogs with SCI can be scored reliably after training. A variety of different gait scales can be used in multi-center trials to capture outcome in different ways.
Introduction
Intervertebral disc herniation causing acute onset of paralysis is one of the most common diseases encountered at veterinary emergency clinics. 1,2 Because of the frequency and potential severity of this condition, there is an enormous body of retrospective outcome data published, 3,4 and there have been numerous randomized controlled trials (RCT) aimed at evaluating different therapies. 5 –11 Moreover, the importance of this naturally occurring form of spinal cord injury (SCI) as a clinically relevant translational model of human SCI has been recognized. 4,12,13
CANSORT SCI is a consortium of veterinary clinicians and researchers established to enhance scientific discovery and therapy development for SCI using naturally occurring SCI in the dog. One of the most important components of performing research and clinical trials is having robust, reliable, discriminating and relevant outcome measures. 14 –17 For SCI, assessment of gait is a key outcome measure.
There are a number of different ways to assess gait in pet dogs that have not been trained specifically for a particular functional test. These include allocating dichotomous outcomes (e.g., independent ambulation yes or no), using ordinal gait scales and generating continuous data through treadmill analysis, kinematic analysis and force plate or pressure sensor data collection. 18 The choice of analysis is influenced by the research question being asked, as well as important considerations around the resources available at participating study centers.
CANSORT SCI aims to perform multi-center RCT, and therefore there is a need to establish a training platform to ensure uniform outcome assessments across the CANSORT centers. There is also a need to evaluate the performance of different gait assessment tools in terms of interobserver repeatability, ceiling effect, and impact on clinical trial design.
We studied four well-established, previously validated, gait assessment tools for dogs with SCI, including a binary assessment of independent ambulation, a version of the modified Frankel Score (MFS), the Open Field Scale (OFS), and the Texas Spinal Cord Injury Scale (TSCIS). We hypothesized that raters of varying expertise can be trained to score gait in dogs with SCI reliably using an online training module.
The aims of this study were: first, to develop and test a gait assessment training tool; second, to compare and contrast different gait assessment scales for ease of use, interobserver repeatability, and ability to capture recovery without a ceiling effect; and third, to evaluate correlations between the systems and to identify the advantages and disadvantages of each system. The final aim was to provide details of statistical methods of analysis that can be used for the different scales.
Methods
Animals
Archived video clips from 15 dogs that participated in two different clinical trials at NC State Veterinary Hospital were used in this study. 8,9 All dogs were non-ambulatory paraparetic (n = 5) or paraplegic, with (n = 5) or without pain perception (n = 5) and had been unable to walk independently for less than 48 h at time of admission. All had an acute intervertebral disc herniation that was diagnosed by computed tomography or magnetic resonance imaging and treated surgically within 24 h of admission.
Dogs were hospitalized for their post-operative care, the details of which are provided elsewhere. 8,9 A neurological examination was performed daily while hospitalized, and dogs were videotaped the day after surgery (labeled day 1 for this study). Dogs were returned to their owners seven or 14 days after surgery (depending on the clinical trial) and returned for evaluation 14 and 42 days after surgery to establish their outcome. At each recheck their gait was videotaped, and a full neurological examination was performed and recorded. The results of proprioceptive placing and hopping scored using standard clinical ordinal scales (0: absent, 1: reduced, and 2: normal, each limb scored separately) and recorded at the time of gait videotaping were also used in this study.
Videotapes of gait
Dogs were videotaped walking on a non-slip surface using previously established protocols, and the video clips were archived. 19 For the training phase of the study, five video clips (not used in the test phase) that captured five dogs at different stages of recovery (paraplegic, non-ambulatory paraparetic, ambulatory but ataxic) were selected and named using the letters GP and numbers one through five. For the testing phase of the study, video clips from 10 dogs on days 1, 14, and 42 after surgery were used. Video clips were named using the word “Test” and numbers 1 through 30. A key to the identity of each video clip (dog study number and observation time point) was kept in a separate location.
Raters
There were 20 raters from eight CANSORT centers in the United States, United Kingdom, Germany, and Switzerland, including nine American or European board-certified veterinary neurologists, one board-certified large animal internist, two neurology residents, five interns, two veterinary students, and one research technician.
Gait assessment systems
Four different gait scales were evaluated in this study. The first was a simple binary categorization as ambulatory, yes or no. The second required categorization as paraplegic, non-ambulatory paraparetic, ambulatory paraparetic, or normal (correlating to a MFS using gait assessment only). The third and fourth gait scales were the TSCIS (gait assessment only) and the OFS (gait assessment only), scales 3 and 4, respectively 5,20 (Supplementary Data S1)
Development of training module
Two authors (NJO, JL) developed a digital training module (Supplementary Data S2). The narrated module provided definitions of key components of each gait scale along with video clips that illustrate each definition. Feedback from the group of raters after initial training was used to improve the module.
Training phase
All raters were provided with a link to the training module and five training video clips as well as an instruction and scoring sheet (Supplementary Data S3). Raters returned the score sheet that included space for questions and comments, and this feedback was used to provide further clarification on specific points and to update the scoring sheet. Once raters' points of discussion had been addressed, they moved on to the testing phase.
Testing phase
Raters viewed 30 test video clips and completed an assessment form for all four systems (Supplementary Data S3). Raters were allowed to refer back to the training module as needed. The identity of the dog and time point of each video clip was revealed only to the statistician to generate recovery curves for each dog. The results of testing proprioceptive placing and hopping, retrieved from each dog's clinical report form associated with their respective trial, were also entered for each time point.
Statistical analysis
Scoring consistency
The data were compiled in R and organized by rater and assessment system (gait scale). For the purposes of analysis, the gait categories in the MFS scale were coded as numbers (1 = paraplegic, 2 = non-ambulatory paraparetic, 3 = ambulatory paraparetic/ataxic, 4 = normal) and the ambulatory status yes or no was coded as a binary 1/0 with 1 = Yes and 0 = No.
Interrater reliability was evaluated by calculating intraclass correlation coefficient (ICC) for MFS, TSCIS, and OFS. When comparing responses for ambulatory status, all raters' observations were used to calculate the mean score of each data point (each dog at each time point) and inspected for values that were not equal to 0 or 1, which would indicate disagreement. The correlation of scoring between the MFS, TSCIS, and OFS assessment systems was assessed by calculating the Cronbach alpha. To determine whether there were differences between raters with differing experience, the raters were separated into skilled raters (faculty and residents, n = 12), and unskilled raters (interns/veterinary students/technicians, n = 7), and the correlation between scoring systems and the Cronbach alpha were recalculated.
Gait scoring system performance
Data were sorted according to whether the dog was considered ambulatory or not using the first, binary categorization, and the frequency with which the different scores for the MFS, TSCIS, and OFS were used was plotted on bar graphs for ambulatory versus non-ambulatory observations. To compare how the different scoring systems captured the recovery of dogs with SCI, the mean of all raters' scores was calculated for each video clip for MFS, TSCIS, and OFS. These data were used to generate recovery curves for each dog and evaluate for the presence of a ceiling effect. The effect of adding the ordinal scores for hopping and proprioceptive placing to the mean gait scores on the recovery curves was examined.
Power analyses and data analysis
Different statistical approaches to power analyses and data analysis for each outcome assessment are presented.
Results
Training phase
The training module we developed was 16 min and 37 sec long (Supplementary Data S2). Key definitions incorporated included ambulatory status, which was defined as follows: “Ambulatory animals are able to take 10 consecutive weight-bearing steps. Weight-bearing is defined as placing a hind foot level with or cranial to the hip and extending the stifle to lift the hind quarters level with the thoracic spine.” Feedback after the initial training session led to the addition to the scoring sheet of “Position of foot (dorsum vs. plantar aspect) is irrelevant.”
It was also necessary to clarify that these were consecutive hindlimb steps. An addition was made to the scoring sheet for the test phase to note that the ambulatory paraparetic category also included ataxic dogs–thus, the category was called “ambulatory paraparetic and/or ataxic.” This addressed the issue of raters stating that dogs were normal using this gait assessment scheme while simultaneously scoring them as abnormal on the TSCIS and the OFS because of mild hindlimb ataxia without obvious weakness. Raters reported no problems related to the training module for the TSCIS and the OFS, for which detailed definitions had been developed previously. 19,20
Testing phase consistency of scoring
All raters completed their scoring successfully with no significant issues reported. There were nine of 600 observations where the rater indicated that he or she was unsure, gave a range for a response, or assigned a fractional grade. These data points were excluded from the analysis.
The ICC for the MFS, TSCIS, and OFS were excellent at 0.85, 0.96, and 0.96, respectively. Consistency of the binary allocation to ambulatory or non-ambulatory showed perfect agreement in 27 video clips. In each of the remaining three videos, one rater (not necessarily the same individual) disagreed with the other 19 raters.
Overall consistency was excellent. The correlation between the MFS, TSCIS, and OFS is provided in Table 1. Correlation between all three scales was high with r2 values of 0.89 or greater for all scales. The consistency of score assignment was excellent between TSCIS and OFS (Cronbach alpha of 0.99), but was lower when considering the MFS. When data were separated based on experience levels, these values did not change substantially, but there was a trend for lower agreement between the MFS and the OFS/TSCIS for the group with the least experience (Table 2).
Pairwise Correlation and Cronbach Alpha among Different Scoring Systems
TSCIS, Texas Spinal Cord Injury Scale; OFS, Open Field Scale.
Pairwise Correlation and Cronbach Alpha among Different Scoring Systems with Raters Divided by Expertise
TSCIS, Texas Spinal Cord Injury Scale; OFS, Open Field Scale.
Gait scale performance
When the frequencies of scores assigned were plotted after division into ambulatory and non-ambulatory categories, with infrequent exceptions, scores were distributed evenly (Fig. 1). The frequency with which scores were assigned did differ between scales. Within the MFS, dogs were most commonly assigned to the ambulatory paraparetic group because of the frequency with which dogs recover ambulation but do not have a normal gait. These data demonstrate the difficulty of separating level of recovery once the definition of ambulatory has been met using this scale and the binary ambulatory yes or no scoring system.

Column graphs of the frequency of scores assigned using (
The OFS and TSCIS had more even distribution of scores because of the higher granularity of these scales. The TSCIS was more likely to have even scores assigned once dogs were ambulatory. This is a result of scoring each limb separately and adding the scores together. Marked lateralization of clinical signs is unusual in surgically treated intervertebral disc extrusion (IVDE); thus, each limb scored the same, producing an even score after addition. By contrast, the OFS scores the hindlimbs together, resulting in a smooth distribution of scores across odd and even numbers in this disease process.
Inspection of the recovery curves of each dog demonstrates that the ceiling effect can be a problem with all scales in dogs that recover ambulation. Addition of the hopping and proprioceptive placing scores, either alone or in combination, resolves this issue in these dogs and has no impact on dogs that are not ambulatory (Fig. 2).

Recovery curves of individual dogs over time (days) using the Open Field Scale (OFS, upper two rows) and the Texas Spinal Cord Injury Scale (TSCIS, lower two rows) in isolation or combined with ordinal scores of proprioceptive placing (PP) and hopping (Hop) or both. Scores for dogs that do not recover ambulation (scores less than 6) such as dogs A12 and A15 are not affected by adding in PP and Hop, while the ceiling effect seen at scores of 10–12 when using gait scores alone is resolved by addition of either PP or Hop or both scores.
Statistical analysis
Suggested approaches to power and data analysis are provided in Table 3. Analysis method depends on whether data are categorical (binary or multiple level), ordinal, or continuous. Choice of statistical method is influenced by the sophistication of the statistician, and simple analytic approaches have been provided if a statistician is not available.
Approaches to Power and Data Analysis
Discussion
In this study, an international group of researchers with widely differing expertise were trained to score gait with an on-line tool using four different scales. Training was rapid and effective with excellent interrater agreement for all gait assessment instruments. Agreement between the two 12-point gait scales (TSCIS and OFS) was extremely high, while the correlation with the MFS was good but agreement was lower. All of the gait assessment instruments have a ceiling effect in dogs that recover the ability to walk, but addition of ordinal measures of more complex tasks, such as proprioceptive placing and hopping, addresses this issue.
It is necessary to undertake multi-center studies to perform well powered, rapidly executed clinical trials in canine SCI. A big challenge when performing these multi-institutional, multi-investigator studies is identifying outcome measures that can be deployed reliably by all centers. Outcomes commonly used in naturally occurring canine SCI clinical trials using pet dogs are based around gait, more complex tasks such as proprioceptive placing, presence of nociception, and voluntary urination.
Gait is the single most commonly used outcome because of the overwhelming importance of the ability to walk for pet owners. Gait assessments include numerous different versions of ordinal gait scales organized around the Tarlov and Frankel Scales, as well as binary categorizations on success of recovery and three different expanded ordinal gait scales. 18 –21 Treadmill based quantification, 22 pressure sensitive walkway based, 23,24 and kinematic 23,25,26,27 outcome measures also exist, but these are more difficult to deploy across numerous centers. In this study, we evaluated four different gait scoring systems to determine whether they could be used reliably by different investigators of differing experience and different backgrounds.
Development of the training module was straightforward. The use of video clips of dogs showing key characteristics for important definitions of different gait categories facilitated understanding. A consistent definition of ambulatory was used across all gait categories. This definition required 10 consecutive weight-bearing steps, and while the adoption of a specific number of steps is somewhat arbitrary, this definition has been used in several clinical trials to date and represents a consistently ambulatory dog. 6 –8,11
The training required viewing of the 16-min training module and scoring of five training videos that included stereotypical as well as challenging gaits. Overall, the training could be completed within 45 min, and feedback from this training session was positive. The feedback forms did highlight three definition points that required further clarification, and this clarification was provided to the group and added to the scoring sheets for reference before the testing phase.
The testing phase used 30 video clips from 10 dogs assessed at one, 14, and 42 days after injury and spanned dogs that were paraplegic to dogs that were walking normally. Interobserver consistency was extremely high across all gait assessment systems, with the MFS proving the least reliable. The TSCIS and OFS have very precise definitions for each gait category, and this was reflected in ICC values of 0.96. Of particular note is the absence of difference in reliability when experienced raters (neurology faculty and residents) were compared with inexperienced raters (interns, veterinary students and research technicians). We conclude that these four gait scales can be used reliably by completing the training, regardless of expertise.
When the scoring systems were compared with each other, the TSCIS and OFS correlated almost perfectly with a Cronbach alpha value of 0.99. Indeed, these 12-point scoring systems, while defining each ordinal score differently, essentially capture the same recovery curve and could be used interchangeably. When looking at the frequency of distribution of scores, however, the TSCIS resulted in even scores being assigned more commonly once these dogs were ambulatory, with odd scores used rarely. By contrast, the OFS resulted in a more uniform use of scores. This was a natural result of the symmetry of signs in the majority of these dogs.
We suggest that the TSCIS, by scoring each limb individually, is particularly suited to capturing recovery in independently ambulating dogs with diseases that cause marked lateralization of signs, such as fibrocartilaginous embolism and acute non-compressive nucleus pulposus extrusions. The OFS will provide a more even representation of recovery once dogs with symmetrical SCI have regained the ability to independently ambulate and will therefore capture subtle differences in this phase of recovery if relevant.
An important consideration of any scoring system is whether there is a floor and ceiling effect. Given that dogs with very different severities of SCI become paraplegic, reflected by the range of recoveries seen in this population, 28,29 any gait assessment system will suffer from a floor effect. This is true in experimental models of SCI and in people, where spontaneous recovery after severe injuries is well documented. 30,31 In dogs, this floor effect is overcome in part by assessing presence of sensation, but a 40% recovery rate is seen in dogs with clinically complete SCIs, emphasizing the floor effect. 32 –34 Ultimately, the floor effect can be mitigated further by measuring serum glial fibrillary acidic protein concentrations. 35
The ceiling effect was evident when using binary categorization and the MFS and was still present in dogs that recovered ambulation when using the TSCIS and OFS. Addition of scoring of proprioceptive placing and hopping, more complex tasks that are a routine part of the canine neurological examination, addressed this issue extremely well. These assessments have been historically incorporated into the TSCIS in clinical trials 6,20,36 and have been considered separately when using the OFS. 5,8,9 Either option presents a powerful remedy to the ceiling effect in dogs that recover.
Training and testing of the scoring systems was performed using video clips of dogs walking on the same carpet surface. It is important to understand that the walking surface and conditions can change ambulation dramatically, and a uniform approach to gait assessment in a clinical trial should always define the surface and conditions for the assessment to be made.
When designing a clinical trial, a power analysis should be performed to determine group size. Central to that analysis is the selection of a primary outcome for which there are preliminary data in the test population and an estimate for expected change in outcome with treatment. The outcome should be relevant to the patient population and the test agent. If selecting a patient population for which recovery of ambulation is likely, selecting an expanded gait scale with a high ceiling and looking at the rate of recovery become important. When testing in a patient population in which recovery rate is low, selecting a binary outcome may be more relevant.
Many different approaches to data analysis are possible, and it is important to note that defining absolute thresholds within a scale or thresholds for change in score can facilitate a binary analysis examining many different aspects of recovery. It is not uncommon for expanded ordinal scales to be treated as continuous data, and that may be particularly appropriate when looking at rate of change of score. Indeed, comparison of statistical results when analyzing ordinal data as ordinal versus continuous data frequently does not generate different results. 37 Treating those data as ordinal, however, is ideal where possible, and so the recommendations made used an ordinal approach.
Conclusion
Raters with a range of experience can be trained to score hindlimb gait deficits in dogs with thoracolumbar SCI with a high degree of consistency by viewing a training module combined with practice videos. Scoring dogs' gait using binary outcomes and ordinal scales with differing granularity allow capture of the recovery curve. All scales suffer from a floor effect, and using broader categories results in a ceiling effect, but this can be mitigated by using more expanded scales and combining them with assessments of hindlimb proprioception and ability to hop. There are a wide variety of different ways to capture recovery of dogs with acute SCI that can be deployed across multiple institutions to perform multi-center clinical trials.
Footnotes
Acknowledgments
The video clips used in this study were generated in clinical trials funded by Assisi Animal Health and Morris Animal Foundation.
Funding Information
There was no funding in direct support of this work.
Author Disclosure Statement
No competing financial interests exist.
Supplementary Material
Supplementary Data S1
Supplementary Data S2
Supplementary Data S3
CANSORT SCI Investigators
Ronaldo da Costa, The Ohio State University, Columbus, OH; Joseph Fenn, Royal Veterinary College, Hertfordshire, UK; Nicolas Granger, Royal Veterinary College, Hertfordshire, UK; Nick Jeffery, Texas A&M University, College Station, TX; Sarah Moore, The Ohio State University, Columbus, OH; Yvette Nout-Lomas, Colorado State University, Fort Collins, CO; Veronika Stein, University of Bern, Bern, Switzerland; Andrea Tipold, University of Veterinary Medicine Hannover, Hannover, Germany. Raters for the study, not CANSORT members: Jackie Blake, North Carolina State University College of Veterinary Medicine, Raleigh, NC; Guillaume Dutil, University of Bern, Bern, Switzerland; Ashley C. Hechler, The Ohio State University, Columbus, OH; Enrice Huenerfauth, University of Veterinary Medicine Hannover, Hannover, Germany; Marta Karn, Colorado State University, Fort Collins, CO; Anna Knebel, University of Veterinary Medicine Hannover, Hannover, Germany; Nina Meyerhoff, University of Veterinary Medicine Hannover, Hannover, Germany; Jasmin Nessler, University of Veterinary Medicine Hannover, Hannover, Germany; Bethany Pastina, North Carolina State University College of Veterinary Medicine, Raleigh, NC; Alexandra Stachel, North Carolina State University College of Veterinary Medicine, Raleigh, NC; Antja Watanangura, University of Bern, Bern, Switzerland.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
