Abstract
Population-level administrative data—data on individuals’ interactions with administrative systems, such as health-care, social-welfare, criminal-justice, and education systems—are a fruitful resource for research into behavior, development, and well-being. However, administrative data are underutilized in psychological science. Here, we review advantages of population-level administrative data for psychological research and provide examples of advances in psychological theory arising from administrative data studies. We focus on advantages in three areas: the collection and recording of population-level administrative data, the data’s large scale, and unique data linkages. We also describe ethical issues as well as methodological considerations and limitations in population administrative data research and outline future directions to enable psychological scientists to more fully capitalize on administrative data resources.
Individuals’ interactions with administrative systems—such as health-care, social-welfare, criminal-justice, and education systems—produce data concerning behavior, development, and well-being. In some jurisdictions, administrative data have been deidentified, linked at the individual level across domains, and made available for research (Fig. 1). In countries such as Sweden, Denmark, Norway, and New Zealand, individually linked administrative data are available for the entire population. In the United States, administrative databases from different sectors cannot yet be integrated at the person-level nationwide, but large-scale administrative data sources exist for research, some of which offer very high coverage of the population of interest. For example, information about births and deaths is available through the National Vital Statistics System; health-care use through the Center for Medicare and Medicaid Services and Veterans Health Administration; benefits through the Social Security Administration; child maltreatment from the National Child Abuse and Neglect Data System; and school test scores from the National Center for Education Statistics. There is also growing recognition among social-science researchers and policymakers of the potential benefits of administrative data linkage (Penner & Dodge, 2019).

Example of the types of information available within population-level administrative data collections. The different data domains and specific data sets are from the New Zealand Integrated Data Infrastructure (IDI), a population-level administrative data source. Detailed information about the IDI, including information about additional data sets and the dates from which they are available, is provided at https://www.stats.govt.nz/integrated-data/integrated-data-infrastructure/data-in-the-idi. Adapted with permission from Statistics New Zealand.
Despite the availability of administrative data for research, use of these data is not routine in psychology. Psychologists’ comparatively limited use of administrative data relative to fields such as economics, public health, and demography might reflect a research tradition of conducting highly controlled laboratory experiments and collecting data through self-report, laboratory tasks, and behavioral observation, with less emphasis on secondary data research. However, research utilizing population-level administrative data has already advanced the field (Milne et al., 2022), and increased familiarity with administrative data resources could enable further advances in understanding psychological processes. Here, we review advantages of population-level administrative data for psychological science and provide case examples that illustrate advances in psychological theory arising from administrative data studies. We focus on advantages in three areas: the collection and recording of population-level administrative data, the data’s large scale, and unique data linkages. We also describe ethical issues as well as methodological considerations and limitations in population administrative data research and outline future directions to enable psychological scientists to more fully capitalize on administrative data resources.
Collection and Recording of Population-Level Administrative Data
Psychologists collecting data encounter challenges, including participant burden, financial costs, response biases, and participant dropout. Administrative data address these challenges: They are collected at no burden to participants and less expense to researchers and are less affected by self-report biases and dropout. The data comprise date-stamped records of individuals’ experiences over time (e.g., health treatments, criminal convictions, school test scores, earnings, and social-welfare benefit use), with some administrative collections spanning multiple decades. Population-level administrative data are thus an efficient resource for identifying long-run impacts of early life factors that would otherwise require many years of data collection (Milne et al., 2022). Further, the data’s detailed information on event timing allows researchers to establish that exposures precede outcomes. These features have enabled scientists to refine developmental models of psychological outcomes that emerge over time, into old age. One example is dementia. Depression has been identified as a dementia precursor, but because cognitive decline can start years before a dementia diagnosis, symptoms that appear to be depression may actually be early indications of dementia. To establish depression as a risk factor, it must be measured well in advance of dementia’s risk period. Researchers tackled this challenge by leveraging more than 40 years of Danish register data to show that depression diagnosed early in life (between the ages of 18 and 44 years), in addition to depression diagnosed later, forecast dementia (Elser et al., 2023). Our group has used New Zealand hospital register data to show that long-term associations of mental-health problems with dementia in fact extend to conditions beyond depression (Richmond-Rakerd et al., 2022).
Researchers have also linked administrative data generated at different time points to construct longitudinal studies that yield insights into psychological processes. For example, social-science research suggests that contact between social groups can reduce prejudice (Paluck et al., 2019), but studies of contact across race and ethnicity have typically used cross-sectional or short-term longitudinal data. Researchers investigated sociopolitical outcomes of contact between racial groups using machine learning to link U.S. contemporary political records to 1940 administrative Census data. They found that White men with Black neighbors were more likely to be associated with racially liberal politics more than 70 years later (Brown et al., 2021).
Population-Level Administrative Data’s Large Scale
Population-level administrative data are, by definition, very large in scale. This presents at least five advantages for psychological scientists. First, the data enable the investigation of low-frequency conditions. Researchers interested in uncommon characteristics must often sacrifice representativeness for statistical power. For instance, clinical psychologists studying relatively rare psychiatric outcomes, such as schizophrenia, often use clinical samples to obtain sufficient cases. Administrative registers provide the sample size necessary to investigate such conditions within population-representative data.
Second, population-level administrative data’s large scale helps address other challenges to representativeness and generalizability in psychological research. Convenience samples often used in psychological studies, including undergraduate-student samples, may not represent the broader populations to which psychologists hope to generalize with respect to factors such as race, ethnicity, and education (Thalmayer et al., 2021). Administrative databases capturing all or most of a population of interest, such as birth records, can serve as a sampling frame for data collection or might be used to determine how well a sample represents a target population. Large-scale administrative data also allow researchers to evaluate the generalizability of a finding by testing whether it extends across different population subgroups.
Third, administrative registers facilitate research involving people who may be marginalized, such as those exposed to adversities. These individuals may not be well represented in smaller scale cohort studies because they make up a relatively small portion of the population, or because experiences of adversity may shape their willingness or ability to participate in research. However, the large size of administrative databases—together with limited dropout—present opportunities for understanding marginalized groups’ experiences in ways that might not be possible in smaller cohorts (e.g., by considering exposure to multiple adversities across different time points). In a Danish register analysis, we evaluated individuals’ exposure to a range of early life difficulties—including parental unemployment, incarceration, mental-health disorders, death, divorce, and foster-care involvement—and considered how exposure timing shaped risk for early adult outcomes, including poor mental health, low education, labor-market disconnection, and crime (Andersen, 2021). Individuals were more likely to experience these outcomes when adversities were experienced in early adolescence rather than early childhood.
Fourth, administrative data’s large scale enables researchers to uncover novel associations that can be further evaluated in smaller scale experiments or cohorts to improve understanding of the underlying mechanisms. For example, in a Swedish register study of more than 1.8 million individuals, researchers found that individuals who self-harm are at elevated risk to also harm others through violent offending (Sahlin et al., 2017). Drawing on theories linking problems in self-regulation with self- and other-directed violence, our team followed up this finding in a U.K. cohort to show that such “dual-harming” adolescents were characterized by self-regulation difficulties already in childhood (Richmond-Rakerd et al., 2019).
Fifth, large-scale administrative data present advantages for carrying out natural experiments. For many exposures of interest in psychology (e.g., maltreatment, mental-health disorder), estimating causal effects through randomized experiments is not ethical or feasible. Natural experiments—which rely on naturally occurring random variation in an exposure, intervention, or characteristic (a “treatment”)—can be an effective method for estimating causal effects in the absence of experimenter-controlled random assignment. To infer that a difference between groups reflects a treatment effect, they must be well matched; this is more achievable in large samples. Collecting data at the population level also enables researchers to exploit naturally occurring differences in treatment exposure across geographic regions. This approach is well utilized in the study of psychological and health effects of programs and policies (Schwartz & Glymour, 2023).
Natural experiments using large-scale administrative data have made advances in understanding the psychological impacts of difficult-to-anticipate events, including disasters. As an example, in an analysis of Swedish administrative records, researchers exploited variation in birthplace (and associated variation in levels of Chernobyl disaster fallout) to test the long-run effects of radiation exposure on cognitive functioning (Almond et al., 2009). Individuals born in areas of higher fallout had worse educational outcomes, particularly in math. We hope such examples inspire a greater uptake of natural experiments among psychologists. Despite their benefits for causal inference, these designs are underutilized in psychology relative to other social sciences (Grosz et al., 2024).
Administrative Data Linkages
Administrative databases can be linked in ways that present advantages for psychological research. First, data are often integrated at the individual level across domains, such as health, education, criminal justice, social services, housing, and income (Fig. 1). This presents the opportunity to ascertain a range of psychologically relevant measures and test their associations across domains (e.g., childhood relocation with psychiatric, substance-use, criminal-offending, and education outcomes; Bramson et al., 2016).
Second, linked administrative databases capture different system levels. Developmental theories such as ecological systems theory (Bronfenbrenner, 1977) posit that human development is shaped by an interplay between the individual and various levels of their environment. However, it can be challenging to measure macroenvironments at scale. In some administrative collections, residence information can be linked to data about neighborhood and environmental characteristics, providing opportunities to test how these systems are associated with individual functioning. Research in this area is advancing our understanding of the potential psychological benefits of contact with natural environments. For example, using Danish registers, Engemann et al. (2020) connected information about childhood environments to psychiatric diagnostic records to evaluate mental-health outcomes of exposure to vegetation, water bodies, and agriculture and found that exposure to these natural environments was associated with lower rates of psychiatric conditions. Vegetation-density level helped to explain associations, but air-pollution mitigation was less important. This study illustrates the benefits of probing specific pathways underlying associations of large-scale environmental measures with psychological outcomes.
Third, many administrative collections enable linkage of family members through birth records, person registers, or insurance records. These linkages are useful for studies of cross-generational effects (Milne et al., 2022; Fig. 2). They also present the opportunity to implement well-powered sibling-difference and offspring-of-siblings designs that control for familial confounds—unmeasured genetic and family-environmental factors that relate to exposure and outcome and can be third-variable explanations for associations (Fig. 2). These approaches have helped behavioral scientists refine theories about the role of putative environmental risk factors in psychological development. For instance, individuals exposed to maternal smoking during pregnancy are at increased risk for cognitive, social, and behavioral problems, but studies of differentially exposed siblings—including analyses leveraging administrative records—suggest that these associations are not causal but rather attributable to familial confounding (D’Onofrio et al., 2013).

Research designs enabled by linking administrative data between family members across generations. Multiple generations of family data enable several research designs, including multigeneration family studies and quasi-experimental designs (sibling-difference and offspring-of-siblings designs). Sibling-difference designs account for unmeasured confounders shared by siblings, such as maternal characteristics, features of the home environment, and shared genes. When the siblings are twins, this design additionally controls for pregnancy-related factors and time-varying family circumstances. When limited to identical twins, the design controls completely for shared genes. Offspring-of-siblings designs exploit differences in relatedness between children of siblings (e.g., full siblings, half siblings, fraternal twins, and identical twins) to test the importance of genetic factors for offspring outcomes and the degree to which parental influences are environmental versus explained by shared genes. Adapted from Milne et al. (2022).
Fourth, linkage of administrative data to large cohort studies used for psychological research (e.g., the National Longitudinal Study of Adolescent to Adult Health, Health and Retirement Study, and Avon Longitudinal Study of Parents and Children) is increasingly common and can help researchers address methodological challenges related to nonparticipation. Individuals who are recruited into studies and participate over time may differ systematically from those who are not recruited or drop out, resulting in selection bias. Administrative data include a range of potentially important predictors of nonparticipation (e.g., indicators of mental health, physical health, and socioeconomic background) that when available at study baseline can be used to help determine the extent to which nonparticipation may represent a source of bias and help correct for it via statistical methods (Larsson, 2021).
Ethics
Because administrative data are typically deidentified, institutional review boards (IRBs) often do not require informed consent for administrative data research. It has been suggested that the important question regarding administrative data use without informed consent is “whether an individual’s health, interests or confidentiality could be affected negatively” (Lessof, 2009, p. 42). One consideration in linked administrative collections is the potential for deductive disclosure—that individuals could be reidentified through combining multiple sources of information about them. Administrative data research centers have safeguards to protect against deductive disclosure, including multistage deidentification processes (e.g., ensuring that linkage centers, researchers, and data repositories are never in simultaneous possession of individually identifiable linkage data and information about administrative service contacts; Jutte et al., 2011) and reporting requirements for statistical outputs such as the suppression of counts below a certain value (Milne et al., 2019). However, the potential for reidentification may arise with new data-linkage efforts. Investigators should be mindful of these issues and consult their local IRB regarding new administrative data projects.
It is also important to consider whether data use has “social license”—whether it is acceptable to population groups and stakeholders. Social license has been found to be greater when there is transparency regarding administrative data composition and intended use and trust in institutions maintaining the data (Kalkman et al., 2022). Additionally, framing of research findings should be sensitive to the potential for results to generate stigma. This may be a particular issue for models predicting sensitive behaviors (e.g., health-risk behaviors, crime) and characterizing vulnerable individuals, which have the potential to stereotype groups to which they belong. Further, because large naturally occurring data sets, including administrative records, are increasingly used for artificial intelligence studies to understand psychological processes (D’Mello et al., 2022; Goldstone & Lupyan, 2016), artificial intelligence researchers should familiarize themselves with relevant ethical considerations and guidelines (Jobin et al., 2019).
Methodological Considerations and Limitations
Administrative records may capture a narrow subset of the population. Data that are not dependent on administrative service contacts (e.g., birth records) comprise all or most individuals in a population. However, many types of data are recorded only for those who come into contact with a particular service and therefore reflect service use rather than need. For example, it is estimated that only 17% of people with alcohol use disorder (AUD) receive treatment (Mekonen et al., 2021). AUD will thus not be captured in treatment records for many people, and those for whom it is captured may differ from the broader population of individuals with AUD in certain respects (e.g., greater service access, treatment-system knowledge, and impairment motivating treatment-seeking). Researchers should be careful not to infer that prevalences estimated from administrative data necessarily represent true population prevalences. Doing so can have real-world implications for affected individuals (e.g., underestimation of mental-disorder prevalence may prompt stigma; Wright et al., 2022).
Administrative systems through which data are collected often reflect practices, beliefs, and norms of the dominant culture. Therefore, although administrative registers’ large scale presents an advantage for studying underrepresented groups, the types of data available may be less likely than data collected through targeted studies to capture unique experiences across diverse populations. Additionally, discriminatory system practices, concerns about stigma from service providers, and variability in service access may lead some groups—including those characterized by underrepresented identities—to be more or less likely to appear in administrative collections. Researchers should remain aware of these factors to avoid drawing inappropriate inferences regarding group differences, which may perpetuate or exacerbate inequities.
Administrative data can provide very precise measurements of certain factors, such as the types, timing, and extent of interactions individuals have with public-service systems. However, some constructs may be assessed less precisely. For instance, in some U.S. administrative claims data sets, race and ethnicity are measured using algorithms with accuracy and reliability that are either unknown or that differ across racial and ethnic groups, or race and ethnicity are not measured at all (Nead et al., 2022). These factors could introduce bias and potential harm.
Psychological measures relying on observation, self-report, or subjective ratings such as emotion regulation, social cohesion, and parenting are often not captured within administrative systems (although subjective responses are sometimes captured through linkage to population-based surveys; for examples, see Fig. 1). Linking administrative data to detailed data from surveys and from cohort studies using multiple assessment methods can address this limitation, enabling researchers to interrogate psychological processes underlying associations with officially recorded outcomes (and although less common, linkage to data from smaller scale experiments could also yield valuable insights). For instance, our team connected New Zealand administrative records of individuals’ contacts with health and social services to prospective data from the Dunedin Multidisciplinary Health and Development Study to characterize early life factors that precede adult health and social inequalities (Richmond-Rakerd et al., 2020). Individuals who came into contact with multiple health and social services at a high rate experienced poor brain health in childhood and mental-health difficulties in adolescence (Figs. 3a and 3b). As adults, they reported low life satisfaction (Fig. 3c). These findings provide insights into potential mechanisms for reducing health and social disparities and enhancing population well-being.

Example of insights yielded by linking administrative records to data from a cohort study. Linking administrative records to data from a prospective cohort study helps to characterize the early developmental factors underlying adult health and social inequalities. Individuals who disproportionately experience multiple health and social difficulties (as indicated by frequent interactions with multiple health and social services) are characterized in early life by (a) poor childhood brain health and (b) poor adolescent mental health. As adults, they report (c) low life satisfaction. Supporting young people’s cognitive and mental health could help reduce health and social disparities and support population well-being. Error bars are 95% confidence intervals. Adapted from Richmond-Rakerd et al. (2020).
Where to Next?
We see several additional avenues for psychological scientists to further capitalize on population administrative data resources and make both substantive and methodological contributions to administrative data research. First, cross-national comparative studies are needed to identify consistency and variation in psychological outcomes across cultures and societies. This work will benefit from efforts to support the development of administrative data resources in low- and middle-income countries (Igumbor et al., 2021), as most research using administrative registers comes from Western and industrialized nations.
Second, psychologists should seek involvement in new data-linkage initiatives. Initiatives are taking place across countries (Gordon, 2020), including the United States (National Academies of Sciences, Engineering, and Medicine [NASEM], 2023). In the United States these include, for instance, the growth of integrated health-information systems, linkage of state-level administrative data across health- and social-services programs (Foust et al., 2022), and efforts to integrate different Census Bureau data resources with each other and with data from public- and private-sector sources (the Frames project; NASEM, 2023). Psychologists might help shape the types of data collected through administrative systems, such as by suggesting items to include in surveys administered by government agencies. Psychologists’ involvement can also encourage new linkages to administrative data sources of interest to social scientists, as well as attention to relevant ethical considerations (e.g., the Belmont principles; Light et al., 2024).
Third, there are opportunities for psychologists to enhance causal inference in administrative data studies. An attractive approach is to incorporate administrative data into randomized experiments (Schwartz & Glymour, 2023). The data can be used in the development and implementation of randomized trials of psychological treatments to generate pilot information and identify target groups, demonstrate between-group equivalence on a range of observable indicators, characterize treatment effects on diverse outcomes, provide a cost-effective mechanism for conducting long-term follow-ups, and evaluate the system-level impacts of policy implementations (Hyatt & Andersen, 2019). For instance, Hyatt and Andersen (2019) showed that administrative register data could be used to evaluate 10-year outcomes—including recidivism, unemployment, and mortality—of different criminal-sentencing options.
Fourth, training in large-scale data management and analysis and administrative data protections and limitations is not routine within psychology. However, it is essential for researchers who desire to work with population-level administrative data sets. Increasing relevant training opportunities and fostering collaborations with fields in which administrative data research is common—for instance, public health, demography, and economics—will build psychologists’ capacities.
Last, interdisciplinary research efforts could yield additional benefits. Collaboration across fields can stimulate the integration of novel theoretical and methodological perspectives into psychological research and expand the diversity of data sources used to address psychologically relevant questions. It can also allow psychological scientists to enhance administrative data research in other disciplines. Psychologists bring unique expertise in theoretical models of cognition and behavior, including the role of individual differences. They can provide important information concerning construct validity in administrative data studies—how well administrative data-based measurements represent the higher order constructs they are intended to measure. Clinical psychologists have insight into real-world clinical practice that shapes the information available within administrative health-care data sets. Thus, interdisciplinary research using population-level administrative data can not only inform psychological science but also expand the reach of psychological science. This article’s authors include individuals with backgrounds in clinical and developmental psychology, life-course epidemiology, sociology, and public health; we have experienced firsthand the benefits of sharing knowledge across disciplines when undertaking research with administrative data.
Recommended Reading
Milne, B. J., D’Souza, S., Andersen, S. H., & Richmond-Rakerd, L. S. (2022). (See References). Reviews the use of population-level administrative data for developmental research.
National Academies of Sciences, Engineering, and Medicine. (2023). (See References). Reports on the ways in which data sources, including administrative records, can be used to enhance the information collected from surveys.
Penner, A. M., & Dodge, K. A. (2019). (See References). Outlines the potential for administrative data research to inform social science and policy, with suggestions for how to advance administrative data infrastructure in the United States (see also the double issue following this article for examples of social-science research using U.S. administrative data resources).
Richmond-Rakerd, L. S., D’Souza, S., Andersen, S. H., Hogan, S., Houts, R. M., Poulton, R., Ramrakha, S., Caspi, A., Milne, B. J., & Moffitt, T. E. (2020). (See References). Provides an example of cross-national replication in population-level administrative data and psychological insights yielded by linking administrative records to cohort study data.
Xafis, V., Schaefer, G. O., Labude, M. K., Brassington, I., Ballantyne, A., Lim, H. Y., Lipworth, H., Lysaght, T., Stewart, C., Sun, S., Laurie, G. T., & Tai, E. S. (2019). An ethics framework for big data in health and research. Asian Bioethics Review, 11, 227–254. Presents a framework for ethical conduct of research using large-scale data resources such as administrative data.
Footnotes
Acknowledgements
We thank Statistics New Zealand for permitting us to adapt their visualization of the data available in the New Zealand Integrated Data Infrastructure. We also thank the anonymous reviewers for their very helpful comments.
Transparency
Action Editor: Robert L. Goldstone
Editor: Robert L. Goldstone
