Abstract
Fish are increasingly used as experimental animals across research fields. Currently, around a quarter of all experimental animals used are fish. Less than 20% of these are standard model species. Welfare assessments for experimental fish are in their infancy compared with those for rodents. This can be attributed to the diversity of species used, the relative recency of fish as the go-to model for research, and challenges to assess welfare in non-vocal underwater species. The lack of guidelines and tools presents a challenge for researchers (particularly, for newcomers), for ethics committees and for implementing refinement measures. Here, we present an adaptable, user-friendly score sheet for fish based on MS Excel. The parameters are based on a literature review, have been validated by expert interviews and evaluated by a fish pathologist. The tool allows scoring of individual fish as well as groups, calculates summary scores and visualizes trends. We provide the underlying literature, give use examples and provide instructions on the adaptation and use of the score sheet. We hope that this tool will empower researchers to include welfare assessment in their routines, foster discussions on fish welfare parameters among scientists, facilitate interactions with ethics committees and, most importantly, enable the refinement of fish experiments.
Introduction
A wide range of research areas use fish as experimental models. Topics include toxicological tests, 1 pharmacology, 2 developmental biology,3,4 complex human diseases 5 and behaviour. 6 Fish are also often used to probe the response of aquatic organisms to the impact of global change, ocean warming and acidification. 7 Teleost fish share key genetic, anatomical and physiological properties with humans5,8 and therefore are valuable organisms to investigate basic evolutionary concepts,9 –12 stem cells and regeneration, 13 the impact of chemicals and pollutants on organismic health, 14 and pharmacology. 15 Accordingly, about a quarter (27.6%) of all animals used for research in the EU27 + Norway are fish. Zebrafish (Danio rerio), as the most widely known experimental fish model, constitute ∼13% of all experimental fish, 15 but numerous other species are used in research,5,16 such as medaka (Oryzias latipes), 17 fathead minnow (Pimephales promelas), 18 trout (Salmonidae), 19 cichlids (Cichlidae), 20 sticklebacks (Gasterosteidae), 21 guppies (Poecilia reticulata), 22 mollies (Poecilia latipinna) 23 and gobies (Gobidae). 24
In accordance with legal requirements to implement the 3Rs (Replace, Reduce, Refine) 25 in animal experiments, and to report the actual severity of procedures, 26 the need to objectively assess welfare in animal experiments is non-negotiable. The 3Rs postulate to replace animal models with non-living models, to reduce the number of animals used and to refine the experiments to reduce negative impacts. They are foundational for laws and recommendations on animal welfare in research. 27 Refinement in particular is highly dependent on the monitoring and documentation of animal welfare during husbandry and experimentation. Accordingly, welfare monitoring is legally mandated in many countries and is usually accomplished through score sheets.
Score sheets document the state of welfare and allow the detection of welfare issues by comparing groups, individuals or time-points in a semi-quantitative, semi-objective manner. However, animal welfare remains a tricky concept. European Union regulations on animals used for scientific purposes contain no notion of welfare, 26 and national definitions of welfare vary (ranging from the five freedoms 28 to dignity- or wellbeing-focused approaches 29 ). Fish welfare is particularly difficult to define. 30 While fish are treated as sentient by law, 26 and despite key initiatives,30 –36 some sources still express a need for debate.27,31,37 Knowledge on fishes’ emotions and behavioural changes is available for some species,8,30,38,39 but fish pain perception is still being discussed controversially.31,40 Consequently, evaluation tools and validated clinical and behavioural indicators of wellbeing are scarce beyond zebrafish.41 –43
In contrast to e.g. mice, score sheets for fish need to account for a vast diversity of species with distinct behaviours, morphology, physiology and environmental requirements, 8 a task that is not trivial considering that evolutionary distances among fish species can exceed those between humans and armadillos. 44 Characteristics that are well established for rodents or agricultural species, 45 such as vocalizations, body language and behaviours such as grooming,34,45,46 are absent in fish. Occasionally, decipherable behaviours have been identified in some fishes, but most observed behaviours remain understudied or uninterpretable. 39 In addition, fish are often housed in large groups, which complicates individual welfare monitoring and can result in specific welfare problems (e.g. hierarchy or territory issues), but also adds novel observable and measurable behavioural indicators. Finally, fish are often sourced from the wild, which means they enter housing with a range of background conditions considered ‘normal’ (e.g. parasitic infections). 47
Training opportunities on fish welfare are at present severely limited. Good welfare practices are known to depend on skilled staff,8,39 yet fish researchers, authorities and ethics committees are usually limited to superficial broad-band education on ‘non-rodent species’. Fish research publications typically do not include score sheets,8,30 which limits opportunities for discourse about useful and implementable quantitative or qualitative parameters for welfare evaluation. A potential ally can be found in the field of aquaculture, where fish welfare is inherently considered important given its impact on disease incidence, growth rates, meat quality and thus business revenue (e.g. the tool MyFishCheck48 –51), but interactions between research and aquaculture are limited. Together, a lack of resources, expertise and frameworks regarding fish welfare scoring in experimental research poses a serious challenge for researchers in the implementation of the third R for fish, particularly for newcomers to the field.
This publication provides fish researchers with a versatile, multispecies, modular, literature-based and user-friendly score sheet and thereby empowers them to assess the welfare of their animals, implement refinement strategies and fulfil legal welfare monitoring requirements in an evidence-based manner. To develop this tool, we first systematically listed and evaluated fish welfare parameters. This entailed a scoping literature review and an exhaustive compilation of clinical and behavioural parameters that could be used as indicators of the welfare state of fish. We also conducted qualitative guided conversations with researchers and veterinarians working on different fish species and experimental systems. We then reduced the list of potential indicators to a handful of individual and group parameters based on the possibility of systematization, non-intrusiveness of measurements and modularity of parameters (e.g. module ‘behaviour’). For each parameter, we then defined explicit, easy-to-identify deviations from the desirable state. We provide these parameters in the context of a modular and interactive MS Excel-based table that a) accommodates individuals as well as groups of fish and b) can be adapted easily for the species, the experimental context or the housing situation.
We are convinced that the score sheet will simplify welfare monitoring for fish researchers worldwide, promote the use of pre-studies to optimize animal numbers and experimental conditions, facilitate 3R implementation as well as licence application procedures, and serve as useful starting point for discussions within and between laboratories regarding fish welfare.
Methods
Scoping review
To gather the published state of the art regarding fish welfare assessments, we conducted a scoping review 52 on welfare indicators for juvenile and adult fish. The main research question was ‘Which welfare indicators and score sheets are currently used for fish?’. PubMed and Web of Science were queried with 40 combinations of the terms ‘fish’, ‘rating’, ‘welfare’, ‘indicator’, ‘score board’, ‘score card’, ‘monitoring’, ‘assessment’, ‘humane endpoint’, ‘perch’, ‘trout’ and ‘zebrafish’ (see Supplementary material File 1 online). When a search combination received too many hits to screen and select, the search term was refined. The relevance of papers was assessed by screening the title and the abstract. Papers were retained if these suggested the paper could contain information about fish welfare. Retained papers were then screened for welfare indicators and retained if they mentioned such parameters. Reference sections of retained publications were additionally screened for publications that did not come up in the original search and were subjected to the same screening process. Eighty-four documents were directly retrieved from the databases in round 1. An additional 15 relevant documents were identified from snowballing the references of the round 1 documents. From the 99 documents containing welfare parameters that were retained (Supplementary File 1), a list of welfare parameters was compiled (Supplementary File 1).
Expert interviews
To understand practical considerations and the diversity of needs and implementation situations, 10 fish facilities in Switzerland housing zebrafish, perch, trout, stickleback and cichlids for research on RNA, epigenetics, genetics and genomics, development, health and infection biology, and behaviour were visited. The persons in charge of running the facility were qualitatively interviewed in semi-structured interviews with an interview guide. Questions included fish species, husbandry parameters, types of experiments done, current approaches to measure welfare (tool, criteria, frequency, group or individual level, time, responsible person, treatment of transgenic lines, reasons to not use a score sheet if none was used, source of the tool), wishes and thoughts towards score sheets (what would make it easy to use, digital or analog) and considerations concerning the collected parameters (relevance of parameters, additional suggestions, species-specifics, how to group parameters, potential endpoints, relevance of a total integrating score). Answers were recorded with pen and paper and later transcribed into an MS Excel sheet. Also, four additional conversations were conducted with experts on animal welfare to understand what they would expect from or look for in a scoresheet. Questions are included in Supplementary File 1. The responses, except for a summary and individual quotes, are not shared to maintain participant confidentiality.
Parameters
The parameters identified in the literature were narrowed down to a meaningful, feasible set of 29 parameters in a first round by co-author HS, a diagnostic and research veterinarian and fish pathologist with 20+ years of experience in fish health and fish welfare, and further refined to 21 parameters for individuals and 10 parameters for groups based on the expert interviews and discussions within the authors’ respective research teams (who feature experience working with zebrafish, trout, perch and gobies, and rodents). Specifically, the following steps were taken: a) non-invasively observable parameters were selected (for example, blood cortisol levels as stress indicators require invasive measurements), b) related parameters were grouped (for example, swimming equilibrium 53 and buoyancy 54 were summarized to navigation problems), c) a description for possible states was developed (for example, ‘normal for the species’, ‘mild weight loss/gain’ and ‘severe weight loss/gain’ for the parameter ‘nutrition’), d) considerations that could affect scoring were collected (e.g. recent feeding may cause non-problematic abdominal swelling), and e) observable states were linked to numeric scores with 0 = ideal state, 1 = not ideal, 2 = observe, act if possible, and 3 = act immediately, potentially terminate (Table 1).
Parameters. Welfare parameters, explanations of these parameters, and signs to look for. Includes scores in numbers and in explanations. This table is translated into an interactive scoring tool in Supplementary material File 2 online.
Results
Expert interviews
Expert interviews confirmed that fish welfare assessments and their use in refinement can be considered an area under development, with unsatisfactory options for fish researchers. A frequent practice was to copy other researchers’ score sheets primarily to satisfy administrative requests (quote: ‘Once the canton of … requested more parameters, so we just added one of them’). Opinions on welfare were often based on years of experience working with fish, rather than on explicitly scorable, model specific and objective criteria. Only three of the 10 facilities were using a score sheet as welfare tool at the time. Two facilities stated not using a score sheet because it was not compulsory in their region. Usually, more than one person was in charge of assessing fish health. Some facilities stated that certain types of physical/morphological aberrations or behavioural signs were considered ‘normal’ under laboratory conditions and were therefore currently not considered in their score sheets. Isolated parameters were generally deemed insufficient to conclude on the welfare state of individuals and groups. Also, behavioural signs of impairment were considered more relevant than physical signs, which may be linked to the current lack of understanding regarding the relevance of, for example, injury and pain for fish wellbeing. The relevance of an integrative total score was generally supported given the mutual impacts of parameters (e.g. damaged fins might lead to altered swimming behaviour). A request was to include potential measures to alleviate the identified issues in a score sheet, since euthanasia of impacted fish would often create issues for experimental design. A digital version was preferred, and a tablet-compatible app was suggested by several participants.
Score sheet
The score sheet developed based on the review and the interviews is available as Supplementary File 2. It contains 21 parameters for individuals and 10 parameters for groups. These 21 parameters are based on literature from 17 distinct fish species from 11 taxonomic families. They are classified into the categories body condition, skin, fins, eyes, gills/opercula, infection, swimming behaviour, feeding behaviour for individuals, and two additional categories (group behaviour and mortality) for groups.
For individually housed or marked fish, for example, in cichlid behaviour experiments, 6 each parameter is assessed by choosing from a descriptive drop-down menu (Figure 1); for example, researchers may choose between ‘fin in healthy condition’, ‘slightly eroded/slightly missing’, ‘moderate fin loss’ or ‘severe fin loss/missing fin’. MS Excel then automatically returns a numerical score according to Table 1. For fish housed in groups, the percentage of fish in the tank with a particular issue is recorded (Figure 1). For example, 80% of fish may be judged to display ‘no abnormalities/lesions’, 10% with ‘slight abnormalities/lesions’, 5% with ‘moderate abnormalities/lesions’ and 5% with ‘severe abnormalities/lesions’ on their fins. MS Excel then returns a weighted average for the tank. Data entry errors resulting in more or less than 100% of fish result in an orange field alert. Skipping the assessment of a parameter in the individual score sheet results in a default score of 1. This avoids the generation of good welfare numbers due to non-scoring. If parameters are not feasible or not relevant, we recommend deleting the parameters from the score sheet during the initial adaptation phase, instead of not scoring them in the experiment.

Score sheets. (a) Score sheet for individual animals. (b) Score sheet for groups of animals. These are screenshots taken from the interactive scoring tool provided as Supplementary material File 2 online.
The score sheet tool provides a total score, a median, mean and maximum score for each assessed timepoint. Seven assessments (e.g. for one week) are designed to fit on an A4 printout. A colour code green to red on a relative scale provides a first visual overview of changes, but, more importantly, a separate trend tab is designed to track developments across assessments. To visualize trends, users copy–paste values from the score sheet tab to the trends tab (detailed instructions are given within the tool). The trend tab accommodates up to 31 successive assessments (e.g. for one month) on one A4 area.
Use scenarios
Below, we use three hypothetical, but common, research scenarios to demonstrate potential applications of the score sheet. These are 1) a pre-trial to identify suited welfare scoring parameters for a genetically modified zebrafish strain with morphological phenotypes, 2) a pre-trial to define a suitable acclimatization period of wild-caught fish to laboratory conditions, and 3) the monitoring of fish welfare during an infection trial to identify appropriate termination criteria.
Scenario 1 (Figure 2) is based on a chemical exposure experiment similar to that of, for example, Baraban et al. 2013. 55 Effects of the chemicals are to be screened in a line of genetically modified zebrafish (mutants) with morphological defects in spine and eyes, and slightly elevated mortality. A pre-trial is designed to determine which parameters are suited to detect any additional impairment caused by the experiment. To this end, a group of the line is scored over the course of a month. It turns out that scores relating to anatomy and swimming are constitutively elevated due to the defects inherent to the line, while skin and feeding parameters fluctuate around zero. Line mortality is 10% higher than usually observed in the wildtype. Based on these – now documented and quantified – observations, the researchers design and submit a score sheet for the exposure experiment. They decide to pay particular attention to skin and feeding parameters, since these parameters display variation and may therefore be sensitive to stressors. Also, they account for the 10% mortality rate during sample size and power calculations and can justify the inclusion of an additional 10% of animals in their animal experimentation permit application. They continue to record anatomy and swimming behaviour but – given that these parameters are constitutively high, reducing their sensitivity – flag them as potentially unhelpful.

Application Scenario 1. Example for a group scoring of an impaired mutant line of zebrafish, trends tab. The trends tab translates the scores into visual patterns, on the one hand through score-dependent colouring of the cells, on the other hand through trend lines. Each value in the top table corresponds to a datapoint in the trend lines (some pointed out by circles and arrows). The summary scores, external signs, and group/behavioural signs are each collated in a separate trend line figure.
Scenario 2 (Figure 3) is based on a behaviour experiment similar to that of Jutfelt et al. 2017. 56 Wild-caught individuals of a wild fish species are given the choice between two environmental conditions to determine their preference or tolerance. A pre-trial is designed to determine a suitable acclimatization period between catching the fish and starting the experiment: short enough to reduce captivity-induced stress, long enough to prevent capture-related stress from confounding the experiment. To this end, several fish are caught and their welfare is assessed over the course of a month. Figure 3 shows data from one representative individual. At no time during the monitoring period, are signs of external damage, lesion or infection observed. However, a diverse range of behavioural abnormalities (e.g. stereotypical swimming and startle responses to movement and sounds) are observed in freshly caught fish. These decline in intensity and from day 19 onwards no behavioural abnormalities are observed. The researchers therefore conclude and can justify that an 18-day acclimatization period is required and sufficient for the planned behavioural experiments.

Application Scenario 2. Example for an individual scoring of a wild-caught fish during the acclimatization period, trends tab. Both the score-dependent colouring on the top and the trend lines at the bottom suggest that the fish is stressed until day 13 of the acclimatization. No behavioural signs of stress are observed after day 19.
Scenario 3 (Figure 4) is based on a drug trial for an infectious disease similar to that of, for example, Kam et al. 2022. 57 In the spirit of refinement, the experiment must be long enough to show differences between control and treatment, and short enough to avoid suffering. Given the infectious disease component, a termination criterion is of the essence: When should a sick animal be euthanized and removed from the experiment? In a pre-trial, the progression and time course of the infection is therefore documented in a small group of fish. The score sheet reveals that skin defects are the first symptom after pathogen exposure, and continuously increase in severity after their appearance. Later, swimming issues (loss of buoyancy) are observed and increase in prevalence until the end of the pre-trial. The researchers decide on a 25-day experimental duration. They also decide to terminate tanks that exceed a total score of 11 for two days in a row, since this would indicate that the drug exacerbates the infection rather than mitigates it.

Application Scenario 3. Example for an individual scoring of an experimental fish exposed to a compound. External signs (i.e. skin defects) are observed before the appearance of behavioural signs and get progressively worse over the course of several weeks. Based on these data, the end point of the planned experiment is determined after onset of signs, but before welfare is too compromised.
Implementation and adaptation of the score sheet
Scoring 21 parameters on multiple fish every day is not sustainable in the long run. A key step in the use of the score sheet is therefore an adaptation phase with a) test scoring of the parameters, b) choosing parameters that are truly useful to detect changes in the given species, situation and experiment, and c) creating an experiment-specific score sheet by deleting non-informative rows from the score sheet, adding parameters, or adapting the scores associated with a particular issue.
We strongly encourage a rigorous and information-driven reduction of the score sheet to achieve maximum information and compliance with minimal effort. The score sheet file contains instructions for the adaptation and implementation phase (Supplementary File 2, tab ‘instructions’). The adaptation phase encompasses six steps: ‘Adapt to setting’, ‘Get familiar’, ‘Generate baseline’, ‘Decide’, ‘Adapt criteria and scores’ and ‘Termination criteria’. We expicitly and strongly discourage uncritical use of the score sheet as-is without an optimization phase. The score sheet file also contains instructions for the implementation phase based on three steps: ‘Data entry’, ‘Documentation’ and ‘Statistics’. These steps encourage sustainable long-term data preservation strategies and support good data management practices, for example, for reporting to ethics committees.
Discussion and conclusion
This paper aims to provide a flexible framework to implement species-tailored score sheets across experimental models, experimental designs, and housing conditions. It is by no means the first attempt to develop a score sheet for experimental fish (see, for example, Table 3 in Martins et al. 2016 58 or Table 2 in OECD 42 ). However, to our knowledge, it is the first adaptable tool that extends beyond a specific species or context and which can be moulded to truly fit situation-dependent requirements.
Applicability
The diversity of fish species kept for research purposes, and their housing both as single fish and large groups, are challenges for welfare assessment. The presented work tackles this in two ways. First, the developed score sheet remains adaptable. Parameters can be removed (which is recommended), changed or added (e.g. for species-specific behaviours, such as mouth brooding or nest building). Second, the score sheet emphasizes the value of relative assessments and trend monitoring. It does not provide absolute numerical cutoffs a priori, and instead empowers users to develop meaningful termination criteria for the respective species, strain, facility and experiment. The operationalization of termination criteria (endpoints where an experiment must be terminated, and animals euthanized, to prevent excessive suffering) is one of the most important aspects and purposes of score sheets. How to arrive at a combined measure of maximally tolerable burden is one of the most important, but also most difficult and controversial parts of welfare assessments. Importantly, termination criteria depend heavily on the weighing of interests and the purpose of the experiment (what extent of suffering is justifiable for the expected gain of knowledge). This score sheet should facilitate the definition of termination criteria and facilitate conversations on termination criteria among fish researchers.
Implications
This score sheet can benefit experimental fish and fish researchers in five ways. First, the tool empowers the individual researcher. Our interviews revealed a prevalence of subjective assessments which may or may not reflect animal welfare criteria, a certain acceptance of stereotypies as given, and a dependence on individual knowledgeable animal care-givers. These aspects could benefit from more objective scoring options. Second, applying this tool fosters within-group interactions and cross-facility communication regarding welfare and termination criteria. This may improve collective welfare knowledge in the fish research community and inspire initiatives and research towards better welfare assessment in fish. Third, the tool may facilitate aspects of regulatory processes regarding animal experimentation. In Switzerland, score sheets are a mandatory component of animal experimentation applications. This tool may support researchers in defining and submitting evidence-based score sheets and stop-criteria. Fourth, ethics committees are usually unfamiliar with fish welfare. This tool could support ethics committees in identifying well-planned and well-monitored experimentation with fish and facilitate communication between researchers and committees. A fifth aspect relates to teaching. Score sheets are valuable tools to discuss fish needs, health and welfare, and can be used to train junior researchers in accurate and attentive evaluation of signs and behaviours.
Recommendations
A key factor in welfare monitoring is compliance and constancy. Given that welfare monitoring binds time and personnel resources, we strongly recommend choosing a few meaningful parameters and monitoring them at longer time intervals, over more intense monitoring plans. Overburdening researchers or animal care-givers is a recipe for failure for a monitoring system.
Prospectively, cross-species and cross-field validations and implementation tests of the presented monitoring tool would be valuable. In terms of technical solutions, a mobile application would greatly facilitate regular monitoring, particularly if combined with visual tracking applications, 59 AI approaches, sensors and existing husbandry management tools such as PyRAT Aquatic. 60
Supplemental Material
sj-xlsx-1-lan-10.1177_00236772241271013 - Supplemental material for An adaptable, user-friendly score sheet to monitor welfare in experimental fish
Supplemental material, sj-xlsx-1-lan-10.1177_00236772241271013 for An adaptable, user-friendly score sheet to monitor welfare in experimental fish by Mathilde Flueck-Giraud, Heike Schmidt-Posthaus, Alessandra Bergadano and Irene Adrian-Kalchhauser in Laboratory Animals
Supplemental Material
sj-xlsx-2-lan-10.1177_00236772241271013 - Supplemental material for An adaptable, user-friendly score sheet to monitor welfare in experimental fish
Supplemental material, sj-xlsx-2-lan-10.1177_00236772241271013 for An adaptable, user-friendly score sheet to monitor welfare in experimental fish by Mathilde Flueck-Giraud, Heike Schmidt-Posthaus, Alessandra Bergadano and Irene Adrian-Kalchhauser in Laboratory Animals
Footnotes
Acknowledgements
We are grateful to numerous experts who contributed their time and knowledge in personal conversations and during site visits, in particular (in alphabetical order): Heinz Belting, Fabienne Chabaud, Nicolas Diserens, Anna Gliva, Marcel Häsler, Ahmet Kürk, Ines Dos Santos, Alba Aparicia Fernandez, Sebastian Leidel, Nadia Mercader, Catherine Peichel, Attila Rüegg, Verena Saladin, Dragan Stajic, Jaques Voland, Gilles Willemin, and Hanno Würbel. We thank Catherine Peichel, Hanno Würbel, and Bernhard Völkl for their supportive and insightful feedback on earlier versions of this manuscript, and Gary Delalay and Eliane Jemmi for proofreading the french and spanish abstract translations.
Author contributions
IAK conceived the project, HS and IAK designed the analysis with input from MFG, MFG collected the data and performed the analyses with support from AB and HS, HS and IAK supervised and guided analyses, MFG wrote the manuscript draft, HS, IAK, AB and MFG edited the manuscript.
Data availability
All data used are available in the Supplementary material. Supplementary File 1: MS Excel with four tabs: literature search parameters, list of retained and consulted literature, extended list of potential parameters including sources, questions from expert interviews. Supplementary File 2: MS Excel score sheet tool with six tabs: instructions, codebook, scoresheet individuals, trends individuals, scoresheet group, trends group.
Declaration of conflicting interests
The authors have no conflicts of interest to declare.
Funding
The authors received no financial support for the research, authorship, and/or publication of this specific article. The article was, however, inspired by work funded by the Swiss National Science Foundation #212526 Trout Immune Priming and #204838 MiCo4Sys.
Research ethics
No animals were used for this research.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
