Abstract
Autologous bone grafting is the clinical gold standard for the treatment of large bone defects, but it can only be obtained in limited amounts and is associated with donor site morbidity. These challenges might be overcome by tissue engineering (TE). Although promising results have been reported, translation into clinics often fails. Lack of reproducibility in preclinical studies may be one of the reasons. We evaluated preclinical models for testing of novel TE strategies, as well as the perception of researchers and clinicians toward the models. Therefore, a review of publications on preclinical models of the past 10 years was performed. A survey addressed to both clinicians and scientists was conducted to assess the clinical need for bone tissue engineering (BTE) constructs and researchers were asked about their satisfaction with the currently available preclinical models. A literature review revealed 169 articles on in vivo studies in the field of BTE, including 26 studies utilizing large animal models and 143 studies in small animals, with rabbits and rats presenting the most commonly used species. Only a few studies used skeletally mature animals, which is in large contrast to the patients targeted. The localization of the bone defects varied, but the vast majority (60%) were segmental bone defects with various fixation techniques. Results of 70 surveys confirmed a great clinical need for TE constructs and positive perceptions of all participants toward its future clinical application. Nevertheless, the need for optimization of preclinical models and limitations when it comes to translation of results to the clinical situation were indicated. No clear trends were detected with regards to the preclinical model, leading to most satisfying results despite the trend that scientists rated generally large animal models higher than small animal models. Results of the literature review and the survey reveal the lack of standardized methods. Despite the affirmed clinical need as well as a very positive perception of clinicians toward the use of TE, results indicate a critical need to optimize preclinical models and, in particular, improve translational aspects of the models. A consensus in the field on a limited number of well-standardized models should be reached.
Impact statement
Preclinical models to evaluate tissue engineering (TE) constructs are still needed. For ethical, scientific, as well as economic reasons, the results should be reliable and reproducible. The chosen animal model with its parameters (i.e., species, age, gender, defect localization) can have a great impact on the results. Researchers have to become sensitized that not only their novel TE strategy but also the model they use to prove safety and efficacy matters. This article analyzes the current situation based on literature search and perceptions of researchers and clinicians. It clearly demonstrates a need for optimized, better defined, and agreed models for bone tissue engineering and a more detailed description thereof in publications.
Introduction
Trauma, tumor resection, and skeletal abnormalities can cause complex bone defects, especially under conditions of compromised healing such as infection, avascular necrosis, atrophic non-union, as well as osteoporosis, entailing a significant burden for the patient. When self-repair mechanisms of the bone reach their limits, osseous reconstruction requires a considerable quantity of bone graft to restore form and function of the affected bone and eventually to improve the quality of life.1,2
With bone being the second most transplanted tissue after blood, there has been a considerable amount of attempts to reconstruct bone and hence to ensure structural and functional integrity. 3 Autologous and allogeneic bone graft materials, synthetic bone substitutes, the use of growth factors and living cells, distraction osteogenesis, or the Masquelet technique represent current clinical strategies with relatively satisfactory outcome for restoration of defects with limited intrinsic regenerative potential.1,3,4 Although autologous bone graft material represents the gold standard in daily clinical routine, it certainly has, like the others, its disadvantages regarding costs, efficacy, and limitation in availability. 1 To overcome these limitations, the field of bone tissue engineering (BTE) has emerged.5–7
For more than 30 years, scientists from all over the world have been striving for tissue-engineered strategies to replace damaged bone and the number of published articles addressing this matter has increased dramatically since 1985, starting with <250 published articles on PubMed between 1985 and 1987 and >4000 articles in 2011. 8 Despite the enormous number of BTE studies published on the preclinical level, only a small number of these approaches have made their way into clinical use. This gap between research and translation into clinical use/commercialization is commonly referred to as “Valley of Death” and was acknowledged in the field of tissue engineering (TE) 10 years ago 9 ; however, it has not been completely resolved so far in any segment of translational research. 10 Several reasons for this have been proposed, including the lack of tissue-engineered constructs that have not only osteoconductive but also osteoinductive properties. 3 A recent review article has addressed the state-of-the art of BTE in 2018 and outlined the remaining shortcomings. 7 The authors particularly stressed the fact that BTE strategies may be improved by considering the physiological healing cascade and may eventually have to follow a personalized-medicine approach.
An important step toward the clinical application of an advanced therapy medicinal product, such as a tissue-engineered construct, is its preclinical testing, which has been likewise recognized as a major hurdle on the way from bench to bedside.11,12 To date, no common guideline for the preclinical testing of tissue-engineered bone constructs exists. Nevertheless, a variety of different preclinical models for orthopedic research, and testing of BTE in particular, have been applied, some of which were described and summarized in a recent review. 13 It appears that the chosen models used to test BTE are a matter of massive variations, not only with regards to their definition but also and not limited to animal species, strain or breed, age and gender of animals, anatomical localization of defects, defect type and size, fixation methods, and the applied outcome measures. This observation is not specific to bone defect models and has likewise been discussed in the context of osteochondral and cartilage defects.14,15
In light of the large discrepancy between the research efforts in BTE, its very limited translational success, and the importance of preclinical testing in this process, here we were interested to gain an insight into (1) which preclinical models have been applied and (2) how scientists and clinicians rate these models. Therefore, a literature research was conducted to assess which preclinical models have been most frequently applied within the past 10 years, with emphasis on the model specifications. To not only investigate the state of the art in terms of the preclinical models in use but also assess their perceived success, a survey was performed to collect experiences and opinions with these models in institutions that are dealing with research in BTE techniques as well as designated experts in human and veterinary surgery techniques.
Methods
Literature search
An online literature search was conducted to evaluate currently used preclinical animal models for testing of BTE constructs. We specifically focused on orthotopic models, since they represent the most critical step for translation of preclinical research to clinical studies. The electronic database PubMed was searched for relevant Anglophone literature of the past 10 years spanning from January 2008 to May 2018. The inclusion criteria were determined and applied on PubMed as follows:
Search: “tissue engineering” AND “bone” AND (“bone defect” OR “fracture”) AND (“tibia” OR “femur” OR “cranium” OR “calvaria” OR “radius” OR “humerus” OR “ulna” OR “maxilla” OR “mandible”) AND (“in vivo” OR “animal model” OR “mouse” OR “mice” OR “rat” OR “sheep” OR “goat” OR “ rabbit” OR “horse” OR “pig” OR “dog” OR “murine” OR “equine” OR “porcine” OR “canine” OR “ovine” OR “rodent”).
The key words “tissue engineering” and “bone” were included separately in the research field to minimize the specification and to obtain more research results. The aim was to evaluate orthotopic models, including a “bone defect” or “fracture” within the most commonly used bones, namely tibia, femur, cranium (calvaria), radius, humerus, ulna, maxilla, and mandible. Further, animal species were determined following the general usage of different kinds of large and small animal models. Thereby, adding the terms “in vivo” and “animal models” was meant to expand the outcome range of literature research on PubMed. Reviews were excluded from the search as well as articles describing pure in vitro studies or in vivo studies utilizing ectopic models.
Collected data from the publications included animal species, age, gender, strain or breed of animals, anatomical location of the bone defect, defect size and form, as well as the applied fixation method.
Survey
The targeted participants were, on the one hand, clinicians with a professional background in orthopedic, trauma, craniomaxillofacial, and veterinary surgery, and on the other hand, scientists who do research in the field of bone biology and regeneration. The questionnaire was distributed in hard copy (Supplementary Data) as well as web based (
The survey was divided into four parts. In the first part, the professional background and the experience level of participants was addressed. In the second part, clinicians and scientists were surveyed separately about their work. Thereby, clinicians were asked to indicate the number of cases they treat with bone graft/bone substitutes per year. Further, two open text fields allowed clinicians to write down the most frequent indications for bone grafting. In addition, clinicians were asked how many of the cases requiring bone grafting they would treat with bone tissue-engineered constructs if available. Scientists were asked to enter the indications they felt to require bone grafting in comparison with the indications targeted with their research. In the third part of the questionnaire, the participants' perceptions toward BTE were evaluated with questions about the relevance of BTE and its potential clinical application in the future. Further, the opinion of participants on the currently used preclinical models was addressed in this part of the survey. Finally, in the fourth part of the survey, the participants were asked to describe the animal models they are using in detail (animal species, age, gender, strain, observation time and methods, implantation site, defect model, defect size and type, fixation methods). Participants were further asked to differentiate between most and least satisfying model designs and to assess the clinical relevance of the models.
Results
Animal models used for testing of bone replacement materials in the past 10 years
A literature review revealed 167 articles on in vivo studies in the field of BTE. These comprised 26 studies in large animals, whereas the remaining 141 studies were performed in small animals. No correlation was found between the applied models and the impact of the journals in which they were reported (data not shown). Rabbits were most frequently used (41.4% of all studies), followed by rats (32%) and mice (10.7%). Among the large animals, sheep were used most frequently with 4.7% (Fig. 1A). Other large animal models applied for testing BTE constructs included dogs (3.6%), goats (3.6%), and pigs (1.8%). For neither of the commonly used animal models a clear preference for a specific gender of animals was apparent (nor was there a gender-specific outcome evaluation) (Fig. 1B). It should be noted that a large part of studies did either not report on the used gender (27%) or used animals of either sex (8%). We next assessed the age of the experimental animals (Fig. 1C, D). Again, not all evaluated studies provided information on the age of experimental animals on entry into experiments; in fact, this information was only provided by 49% of the analyzed studies. For those studies that indicated the age, it is evident that huge variations occur. For example, sheep ranging between 1 and 7 years have been used for bone defect models without indicating a specific scope of the experiment, for example, in terms of healing in old versus young (Fig. 1C). For small animals, the average age was 19.9 ± 6.4 weeks for rabbits, 11.7 ± 4.3 weeks for rats, and 10 ± 2.3 weeks for mice (Fig. 1D).

Species, sex, and age of animals used to test BTE strategies from 2008 to 2018.
Assessment of the defect models, which have been reported in the past 10 years, shows that there exist no preferred models for none of the used animal species as a huge variation of different models has been used (Fig. 2; Supplementary Table S1). In general, most variation was observed in mandibular defects, in particular in terms of the exact anatomical localization, defect shape, and size, which makes it virtually impossible to compare these studies even if they are performed in the same animal. In large animals, drill hole defects were performed at different anatomical localizations, including non-load bearing parts of long bones, namely femur, humerus, and tibia and the cranium as flat bone. No consensus between studies could be found with respect to the defect size of such defects, even not if the same defect model was used. For example, in sheep the diameter of drill holes in femur condyles ranged from 5 to 8 mm. Also, segmental defect models have been used in the femur, radius, or tibia of large animals. Besides again great discrepancies in defect sizes, also different fixation methods have been chosen in these studies, comprising external fixation, internal plates, and intramedullary fixation strategies that were not justified in terms of distinct surgical strategies to be tested. In mice, segmental femur defects were most often reported, with gap sizes ranging between 1 and 5 mm and various fixation strategies, including intramedullary pin fixation, internal plates, and external fixation devices.

Summary of preclinical models used to test BTE strategies from 2008 to 2018. n = 167. For defect size, diameter is given for drill hole defects, gap size for segmental defects, and defect area or volume for other volumetric defects. Due to extreme heterogeneity in defect location and size in mandibular defects, no summary is possible for these defects. Please refer to Supplementary Table S1 for a detailed summary of the used models. Color images are available online.
In rabbits, segmental radius defects have been performed most frequently (in 29 of 61 studies in rabbit), followed by drill hole defects in the femur (11 studies) and mandibular defects (9 studies). Here, radius defects, not requiring fixation, showed again a massive variation in gap size, ranging between 5 and 20 mm. In rats, the most popular defect models were segmental femur defects (23 out of 55 studies) and drill hole defects in the cranium (11 studies). Similar to the segmental defects in other species, gap sizes in femur defects varied between 1 and 10 mm as did fixation methods.
Finally, we assessed which outcome measurements have been applied in the different studies. Histological analysis was applied in 92% of studies. New bone formation was assessed by means of tomography (referring mostly to micro-computed tomography) and X-ray in 65% and 62% of studies, respectively, whereas biomechanical testing was performed in only 34% of studies. A small proportion of studies applied further analysis such as electron microscopy (10%), gene and protein expression analysis (10%), or fluorochrome labeling (9%). Seven percent have used any other outcome measure.
Perceptions of scientists and clinicians toward clinical availability of BTE in the future
A total of 70 individuals participated in our survey on preclinical models for BTE, with 40% being scientists, 51% clinicians, and 7% clinical scientist (Table 1). Since clinical scientists are a rather small population and mostly work clinically, they were included in the “clinician” group in Figure 3. Clinicians included trauma (45%), orthopedic (35%), and craniomaxillofacial (10%) surgeons as well as veterinarians (10%). All levels of professional experience were represented for both scientists and clinicians. A significant proportion of participating clinicians frequently apply bone grafting in their clinical routine [11–50 cases (52%) and >50 cases per year (26%)].

Perception of scientists and clinicians toward clinically availability of BTE in the future and the currently available preclinical models. n = 70.
Survey Participants (n = 70)
Percentages for participants with PhD, professional experience, and bone graft cases per year are given as percentage of the respective group.
Clinicians with double assignments are listed in both categories. Note that not all survey participants provided full information.
CMF, craniomaxillofacial.
First, we were interested to learn about the perception of those polled on the future perspective of BTE. Indeed, 98% of all participants (96% of clinicians and 100% of scientists) indicated that they believe that BTE will become clinically available in the future (Fig. 3A). Thereby, the majority of both clinicians and scientists foresee that this will happen within the next 10 years, with 27% of overall participants indicating a duration of 5 years and 53% a time span of 10 years (Fig. 3B). Clinicians were further asked about how many of the cases that are currently treated with bone replacement materials they would treat with BTE if clinically available, where the majority of 46% would treat most of their cases with BTE, 16% all, 32% few, and 6% none of their cases (Fig. 3C).
The opinion on preclinical models for testing BTE
The main motivation of this survey was to investigate how satisfying the currently available preclinical models for BTE are. Thus, participants were asked how they rate the preclinical models (Fig. 3D). Only 10% of the overall participants answered that the models are well developed and translate well into clinics. A considerable part of participants (34%) have the opinion that models are well developed but fail when it comes to translation, whereas an even greater group (39%) thinks that the models need optimization and 4% even indicated that models are poor. Interestingly, scientists were generally more skeptical toward the models than clinicians, with 50% of scientists indicating that models need optimization and 9% that models are poor.
Which preclinical models are most satisfactory?
In the last part of the survey, participants bringing experience with preclinical models were asked to provide details on the models they have used and to rate them as “most satisfied” and “least satisfied.” A full overview of all models that were named here is given in Tables 2 and 3. In line with the results from the literature review (Figs. 1 and 2; Supplementary Table S1), a huge variety of different models in small and large animals is applied by different research groups. To assess potential trends of some models/model parameters being more successful than others, we have compared how often the respective parameters have been named in either of the two categories (most satisfied/least satisfied; Fig. 4). With regards to the used animal species, it appears that most species are named in both categories. However, sheep is clearly named more often in the category “most satisfied” (30% vs. 10%), whereas mouse appears more often in the category “least satisfied” (17% vs. 30%; Fig. 4A). In terms of the age of animals, no trends were apparent for large animals (Fig. 4B). In small animals, studies using younger animals (<2 and 2–4 months) were more frequently mentioned in the category “most satisfied.” To note, a considerably high part of surveys did not specify the age of experimental animals, which appeared more often in the category “least satisfied.” Finally, different defect types were addressed (Fig. 4C). Here, a large number of different defect models in small and large animals have been listed in both categories. In general, it could be seen that drill hole defects have been listed more frequently among the most satisfactory models, in both small and large animals.

Survey participant's rating of the preclinical models they are using. n = 34. The graphs summarize which models have been rated “most satisfied” and “least satisfied” with respect to animal species
Animal Models Rated as Satisfactory
Details on the bone defect models for which survey participants indicated that they were most satisfied with.
Two defect sides/types indicated in the same survey, listed separately.
ns, not stated; NZW, New Zealand White; SD, Sprague-Dawley.
Animal Models Rated as Nonsatisfactory
Details on the bone defect models for which survey participants indicated that they were least satisfied with.
na, not applicable; ns, not stated; SCID, severe combined immunodeficiency.
Discussion
Almost all surveyed scientists and clinicians believe in BTE. Even if one considers a possible bias of those who answered the survey, it is a strong statement. More than 75% of the survey participants envision a clinical availability within the next 10 years. In contrast to this positive judgment of clinical availability in the future, the vast majority of both groups recognizes the need to optimize the use of preclinical models to evaluate BTE-based strategies and a high variability was seen in the used animal models for BTE. Using a literature review and survey, we could identify critical points that we think need to be addressed to increase translation of BTE into the clinics.
No consensus on the use of preclinical models in BTE
Our literature review revealed a large variation of models that have been applied to test bone tissue-engineered constructs in the past 10 years. Interestingly, only 16% of studies have been performed in large animals, and only 4.7% of studies in sheep, where bone formation rate, size of bones, and biomechanics are comparable to the human situation.17,18 Although this low number of studies in sheep can be explained by the high costs and demanding housing of sheep, it also indicates a potential reason for the low number of studies with translational success. Indeed, our survey revealed that scientists seem to be in general more satisfied with sheep models than with other species. A large number of studies is performed in rodent models. In particular, the laboratory mouse may be considered as one of the most used model organisms in biomedical research; however, although well defined in terms of genomics and various physiological mechanisms, shortcomings of this model in simulating human disease have been recognized. 19 This is not limited to physical parameters such as size or metabolic rate but extents to genomic differences evolved in evolution. Interestingly, it was suggested that gene set enrichment analysis might help to identify models with the highest overlap to distinct human diseases. 20
Another source of variation between studies is the breed or strain of the experimental animals (data not shown). In particular for rodent models, it has been shown that bone formation rate may be strain dependent. Accordingly, both peak bone density and bone healing efficiency was shown to differ between different inbred mouse strains.21,22 Thus, results obtained in different strains may not be comparable. Nevertheless, the possibility to use transgenic and immunosuppressed mouse and rat strains makes it possible to study healing mechanisms as well as to test xenogenic constructs containing human cells. However, immunosuppressed models may have limited translational value since the contribution of the adaptive immune system to the healing process23–27 is, of course, impaired in such models.
The literature search also revealed a high variability in the age of animals on study entry. This has several implications. For example, it has a critical influence if animals are used before or after closure of growth plates and thus at skeletal maturity or before. Kilborn et al. provided an overview on age at growth plate closure in different species. 28 In rodents, growth plate closure of the tibia was reported at an age of 5, 11, and 6.8 months for mice, rats, and rabbits, respectively. This indicated that many preclinical models are performed before growth plate closure (average age: mice, 2.5 months; rats, 2.75 months; rabbits, 5 months), which may lead to an overestimation of the performance of tested constructs and the apparent healing capacity. Moreover, it is well established that bone healing is impaired in aged animals, 29 which has been attributed to several local and systemic reasons, including vascularization, 30 differentiation of osteoprogenitor cells, 31 balance between bone formation and resorption, 32 mechanical properties, 33 and inflammatory cells. 34 Therefore, it is quite evident that the age of animals has a critical impact on study outcome and, eventually, the translational success of studies. In this respect, the high percentage of studies that do not report the age of experimental animals is a major concern. In addition, by using merely young or middle-aged animals at the preclinical state, while targeting treatment of elderly patients, the performance of constructs might again be overestimated and may challenge the translation of results into clinics. The general satisfaction level of scientists participating in the survey was, however, not correlated with the age of experimental animals. It can be assumed that this merely reflects the reproducibility of animal models rather than their translational success.
As shown in the literature review, bone defect models have been applied at various anatomical regions and defect localizations, which has several implications. First, bones of different anatomical localization are of different embryonal origin. The cranium partly derives from the neural crest; whereas all long bones are purely of mesodermal origin, and superior bone formation ability has been shown for neural crest derived cells.35–37 Second, the mechanism of bone formation varies at different localizations between intramembraneous and endochondral bone formation. Finally, bone formation in the defect gap can be also influenced by other determinants, such as the intactness and contact of the enclosed periosteum as a rich source of progenitor cells, 38 which may vary at different anatomical localizations and between studies, for example, due to surgical techniques.
Finally, also no consensus was seen in terms of defect sizes as well as fixation methods of those defects requiring mechanical support, both of which obviously have a critical influence on bone healing rate. In the survey, drill hole models, which do not require fixation, were generally more often categorized as “most satisfied.” Since adequate fixation is a crucial factor for bone healing, and both insufficient and too rigid fixation can delay bone healing, 39 it may be anticipated that the need for fixation adds complexity to models, which, in turn, may result in lack of reproducibility.
Although the diversity of models makes it very difficult to compare results in between studies, it should be also noted that different research questions targeting different diseases/defect types will require different animal models. Further, it is apparent that none of the models will be able to completely resemble the human situation.
Lack of reporting
The literature review revealed that in many studies important information (i.e., age, gender) on the conducted study is missing, making the results difficult to interpret and the study unreproducible. This lack of reporting has been recognized and led to the publication of several reporting guidelines for animal studies, of which the ARRIVE guidelines are most common. 40 Even though many journals have adopted these guidelines, reporting has hardly improved. 41 As a possible way to improve the situation, the authors suggest that journals check completeness and validity of the preclinical information by preclinical experts, such as the involvement of statisticians. In the context of BTE, not only the details on the animal part have to be properly reported but also cells and materials have to be sufficiently characterized.
Lack of translation and standardization
As there is no consensus on the use of preclinical models in BTE, studies differ widely from each other, making it almost impossible to compare results among different studies and to repeat them. As we are working with models and not the “real” situation, all models have their advantages and limitations, mimicking certain aspects of the human situation but lacking others. The decision to use a certain model will be influenced not only by the research question but also by the researcher's past experience, as well as the available funding and (structural) resources. This also explains the variability in using animal models. Further, based on this survey, the lack of translation of the currently applied preclinical models is a major concern as it heavily questions the use of these models at all. This is of particular importance with respect to the 3R principle, to reduce, refine, and replace animal experimentation, 42 which is well recognized among scientists and has recently been extended to a 4R principle by the Max Planck Society, which commits itself to the responsibility to use their knowledge in life sciences and humanities to promote animal welfare. 43 Therefore, models that do not translate into the intended future clinical application in humans should not be used anymore to test BTE. If researchers and clinicians could agree on valuable models and if the parameters of these models would be better standardized, it can be anticipated that the significance of each study as well as the translation of BTE, in general, will improve.
The “Minimum Quality Threshold in Pre-Clinical Sepsis Studies (MQTipSS)” is an example of how the situation could be tackled. 44 Similarly, it was found that preclinical models for sepsis were not well defined and poorly translate into clinics. In a consensus meeting with international experts, guidelines of best practices for animals' models of sepsis were defined, aiming at standardizing preclinical models and at improving the translation of preclinical findings. Societies such as the Orthopaedic Research Society or TERMIS might be appropriate starting points for such initiatives. With this article, we aimed at starting this discussion. By intention, we did not recommend or discuss in detail any particular model (e.g., large vs. small animal model) as this has to be a consensus among the experts of the field. Further, there is no model that fits all research questions, as all models have their advantages and limitations. A certain model might be appropriate for an initial evaluation, but inappropriate for submission to the regulatory authorities. Nevertheless, this article clearly shows that there is a need to decrease the variability and to agree on a few well-described, translational, and standardized models.
As a starting point, researchers should add a justification of their animal model to their publication, such as the “justification of the test system” in Good Laboratory Practice (GLP) studies. The handbook of GLP of the World Health Organization states that it is important that the protocol contains a reason why the test system has been chosen for the study. Often, this is based on the test facility's background (historical) data with the strain concerned, but there may be special scientific or regulatory reasons. In the authors' opinion, this reasoning should also become part of scientific publication. Providing such a rationale will facilitate the discussion as to whether the chosen model is appropriate or not and to identify limitations.
Summary
Despite the affirmed clinical need as well as a very positive perception of clinicians toward the use of BTE, the results of the literature review and the survey reveal the lack of well-defined and accepted models.
There is a critical need to optimize preclinical models and, in particular, improve translational aspects of the models. A consensus in the field on a limited and well-defined number of models should be reached. A first step toward this direction is to significantly improve reporting of the experiments to foster reproducibility of experiments and comparison between studies. Researchers need to adapt the preclinical models to the targeted patient population, in particular in terms of age of patients. In addition, the authors recommend including a justification of the animal model in scientific publications.
Footnotes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
