Abstract
Robust brain extraction is a critical first step for quantitative neuroimaging, yet no dedicated method currently exists for porcine magnetic resonance imaging despite its importance as a translational neuroscientific animal model, particularly gyrencephalic traumatic brain injury (TBI). Porcine anatomy presents distinct anatomical challenges, with extensive extracranial fat and complex textures around the olfactory bulbs that limit the performance of existing human-based tools. We present PIGSKIN, a deep learning framework trained primarily on synthetic data generated from a small set of expert annotations. Unlike conventional approaches, PIGSKIN models brain and non-brain regions separately, applying distinct clustering and transformation parameters to each. This strategy constrains variability in brain anatomy while allowing greater diversity in extracranial tissues, ensuring anatomically consistent label maps in which the brain remains embedded within its surroundings. Additional transformations introduce further spatial and intensity variability, producing a diverse set of synthetic training pairs. During inference, PIGSKIN operates in a single step at native resolution within a standardized cube, preserving fine-scale anatomical detail and supporting generalization across cohorts differing in breed, acquisition protocol, and injury model. The model achieved performance comparable to expert consistency (Dice ≈ 0.97), approaching reported inter-rater reliability. Finally, we show that incorporating co-registered T1- and T2-weighted inputs significantly outperform single-modality training, underscoring the value of multimodal integration for synthetic data generation. Together, these results establish PIGSKIN as the first systematically validated solution for porcine brain extraction and a framework adaptable to other large animal brain extraction tasks.
Introduction
The pig has long served as a valuable biomedical research model and, more recently, has been utilized as a neuroscientific preclinical model, including for advanced neuroimaging. Porcine brain injury models are increasingly recognized as important for neurotrauma research due to their neuroanatomical and physiological similarities to the human brain.1,2 Unlike rodent brains, porcine brains are gyrencephalic and possess higher white-to-gray matter ratios, features that make them more neuroanatomically similar to humans than rodents. As such, they represent a valuable model for studying traumatic brain injury (TBI) and its structural and functional sequelae.1–3 Magnetic resonance imaging (MRI) of pigs, before and after injury, provides a non-invasive, longitudinal window into the evolution of brain pathology and may provide important insights into the mechanistic underpinnings of TBI-related imaging changes observed in humans.
Quantitative analysis of human MRI has consistently demonstrated sensitivity to structural and microstructural changes following TBI.4,5 To enable meaningful cross-species comparisons and translational insights, it is important to adapt and apply similar analytic frameworks to porcine MRI. The field of quantitative neuroimaging relies on automated image processing pipelines that begin with brain extraction (i.e., skull stripping), followed by tissue segmentation, atlas-based parcellation, and the derivation of regional quantitative metrics such as volumetric and microstructural integrity measures.6,7 While robust and well-validated tools exist for human brain MRI, including atlas-based and deep learning–based methods such as BET, HD-BET, and SynthStrip, developed for clinical and research use in large-scale datasets, they do not generalize well when applied directly to non-human species, particularly pigs.8–10
Translating methods of brain extraction from humans to pigs is challenging due to substantial anatomical differences. By contrast to the relatively clean and symmetric separation of the human brain from the skull, the porcine brain is tightly encased in heterogeneous non-brain tissues including muscle, fat, and connective tissue that complicate boundary delineation on MRI (see left panel of Fig. 1). In addition, features such as a prominent olfactory bulb, the inferior extension of the pituitary region, and the frequent inclusion of the upper cervical spinal cord within the imaging field-of-view introduce topological complexities absent in human brains. These structural protrusions challenge the assumptions of smoothness and compactness that underlie conventional brain extraction algorithms,11,12 leading to unreliable segmentation when applied to pig imaging data (see right panel of Fig. 1). Addressing these challenges is essential for enabling automated, scalable, and reproducible analysis of porcine MRI in translational research. Despite the critical need for automated brain extraction in pigs, existing animal pipelines have focused primarily on rodents, 13 ovine, 14 non-human primates, 15 or neonatal piglets.11,16 While promising, these efforts demonstrate the feasibility of species-specific pipelines but remain limited in scope and do not generalize across acquisition protocols, breeds, or injury conditions. More recently, several studies have begun to address porcine brain extraction directly; however, these methods are still restricted to neonatal cohorts, heavily manual annotations, or single-modality training, and no robust solution yet exists for adult porcine MRI across modalities and injury models. As such, manual delineation remains the de facto standard, which is time-consuming, labor-intensive, and requires specialized anatomical knowledge of the porcine brain that is scarce. Moreover, manual annotation is not scalable for high-throughput analyses or for studies involving longitudinal data.

Comparison of brain extraction performance in human versus porcine MRI using a human brain extraction tool.
We developed PIGSKIN (Porcine Brain Extraction Tool), the first automated framework for pig brain extraction in MRI using fully synthetic data. Our approach combines a single-stage U-Net with a dynamic synthetic data generator that models foreground (brain) and background separately, expanding limited annotations into diverse, anatomically realistic image–mask pairs. By leveraging multimodal clustering of T1- and T2-weighted images, PIGSKIN achieves comparable to expert consistency performance and robust generalization across modalities, injury models, and datasets. This establishes PIGSKIN as the first robust solution for adult porcine brain extraction and a broadly applicable framework for annotation-scarce neuroimaging domains.
Materials and Methods
Overview
We developed a deep learning pipeline for automated porcine brain extraction in T1- and T2-weighted MRI to address the lack of species-specific tools and the inability of human-focused methods to account for the distinctive brain morphology and complex extracranial structures of pigs. As a first step, we introduced a multichannel clustering framework to generate initial label maps. To our knowledge, this is the first clustering approach for porcine MRI that leverages both T1- and T2-weighted contrasts, where extracranial tissues such as fat layers and the olfactory bulbs become more conspicuous, producing intricate regions that often confound conventional algorithms. By explicitly capturing these features, the resulting label maps provide greater generalization than those derived from single-modality clustering. Building on this foundation, the clustered label maps were used as the substrate for synthetic augmentation: random transformations such as scaling and rotation were applied, and because the entire brain moves consistently with the deformation, its identity is preserved. When paired with intensity-varied images, this process yields a continuous stream of realistic image–mask pairs, enabling stable training until convergence. Importantly, once trained, the model weights will be released and can be applied directly to new datasets without further training, unless fine-tuning is required for a specific modality. Together, this pipeline establishes the first robust, scalable, and readily deployable solution for adult porcine brain extraction.
Datasets
To ensure generalizability, we used data from two independent porcine TBI cohorts representing different injury mechanisms (focal and diffuse) and pig strains (Yucatan and Hanford minipigs). The first cohort employed the controlled cortical impact (CCI) model, a well-established paradigm of focal injury. Five pigs were scanned longitudinally at baseline (pre-injury) and at multiple post-injury time points (3 days, 1 month, 3 months, and 6 months), yielding 13 imaging sessions in total. The second cohort used a rotational-acceleration model, which produces diffuse axonal injury without focal lesions. This cohort also included 5 pigs scanned at pre- and post-injury time points, yielding 10 imaging sessions in total. Together, these cohorts provided complementary datasets spanning distinct types of brain pathology.
For the CCI cohort, structural MRI volumes were acquired at approximately 0.54 mm isotropic resolution (matrix ≈ 280 × 192 × 352). Both T1- and T2-weighted scans were obtained, and T2 volumes were rigidly registered to the corresponding T1 images for alignment. For the rotational-acceleration cohort, T1-weighted images were acquired with a matrix of 192 × 256 × 160 and a voxel size of 0.70 mm isotropic; only T1-weighted data were available for this cohort. All images were resampled to 0.54 mm isotropic resolution for consistency across datasets and to enable high-resolution segmentation. Downsampled versions at 1.0 mm isotropic resolution were additionally generated for the coarse localization step of PIGSKIN. Acquisition parameters and imaging characteristics for both cohorts are summarized in Table 1.
Dataset Characteristics and Acquisition Parameters
Details of the experimental datasets used in this study, including MRI sequence type, acquisition parameters (TR, TE, TI, FA), matrix dimensions, and voxel size. Both T1- and T2-weighted scans were acquired for the controlled cortical impact (CCI; focal injury) cohort, whereas only T1-weighted scans were available for the rotational (diffuse injury) cohort. Acquisition parameters were extracted from scanner metadata, and voxel dimensions were derived from the Neuroimaging Informatics Technology Initiative headers.
FA, flip angle; TE, echo time; TI, inversion time; TR, repetition time.
Data annotation
Manual brain annotations were performed by two experts using ITK-SNAP. 17 Expert 1 has over 10 years of experience, and Expert 2 has over 3 years of experience. Each expert independently used the 3D round-shaped paintbrush tool to delineate the brain’s soft tissue on axial slices, proceeding slice by slice. The inferior boundary of the annotation was defined at the lowest portion of the medulla. Each subject was inspected across three orthogonal planes (coronal, axial, sagittal) to ensure smooth boundaries and 3D anatomical consistency. For illustration, we report annotation details on a representative baseline subject (animal ID 81, pre-injury scan; hereafter referred to as the reference subject). For this subject, annotation required approximately 1 h, involving review of ∼170 coronal slices, ∼100 axial slices, and ∼100 sagittal slices. On average, annotation required ∼10 s per slice for Expert 1 and ∼15 s per slice for Expert 2, with additional time devoted to challenging regions such as the olfactory bulbs and brainstem to ensure accurate delineation. When disagreements arose, annotations were jointly reviewed and resolved through a consensus adjudication process, yielding a finalized reference mask used for all downstream training and evaluation. Most ambiguities were localized to the brainstem, particularly at its inferior boundary, and were resolved through an additional focused review and correction pass. The level of agreement between the experts was quantified.
Training/test split
For the CCI cohort, seven unique pigs were used to seed the synthetic training generator. Two training configurations were evaluated: (1) a model trained exclusively on healthy (pre-injury) scans and (2) a model trained on the full longitudinal dataset including both pre- and post-injury time points. The remaining four CCI pigs were held out entirely for within-dataset testing, ensuring subject-level separation between training and evaluation. To further assess generalizability, both trained models were applied without retraining to the second independent rotational cohort, representing a distinct injury mechanism and acquisition distribution. Performance was evaluated separately for healthy and injured cases across both cohorts, and the corresponding results are summarized in Figure 2.

Overview of the U-Net architecture for image segmentation. The contracting path (left) applies convolution and max pooling to capture context, while the expansive path (right) uses upsampling and skip connections to recover spatial resolution. Skip connections concatenate feature maps from corresponding layers, enabling precise localization in the final segmentation output.
Model architecture for brain segmentation
The PIGSKIN framework employs a single-stage volumetric 3D U-Net trained and tested at native resolution. Each scan is cropped or padded to a standardized cube of
The U-Net was implemented using the Voxelmorph framework.
18
The network comprised 7 encoding–decoding layers with skip connections, each resolution level containing two
Connected-component filtering was applied post hoc to remove spurious islands, ensuring anatomically plausible final masks. For multimodal training, T1- and T2-weighted volumes were co-registered and processed jointly, with significant improvements observed over single-modality training (see the “Results” section).
Synthetic training data using clustering and spatial transformations
We developed a generative pipeline that dynamically produces synthetic image–mask pairs during training. In each iteration, expert-derived binary brain masks are first spatially transformed, after which foreground and background regions are clustered independently using Gaussian mixture models (GMMs) prior to label-to-image synthesis.
At the core of this pipeline is GMM-based clustering, which models the distribution of voxel intensities as a weighted sum of Gaussian components. The full synthetic augmentation workflow is illustrated in Figure 3, which depicts the sequence of spatial transformations, GMM clustering, and label-to-image synthesis steps that together generate anatomically consistent training pairs. Throughout this process, cluster identities corresponding to the foreground (brain) region remain intact and are consistently mapped to a binary brain mask, ensuring that the final synthetic image and its corresponding binary label are spatially and semantically aligned. All intensity-domain augmentations (e.g., Gaussian noise, Gaussian blur) were applied exclusively to synthesized images following label-to-image translation. Discrete label maps were never subjected to intensity perturbations or boundary blurring.

Synthetic data generation pipeline. Overview of the synthetic training data generation and segmentation pipeline.
Given the expert-annotated binary label maps
The affine transformation
The non-linear component
The augmented label map is then given by:
Gaussian mixture modeling for clustering
Given the deformed label map
The clustered foreground and background components were subsequently passed to a spatial augmentation module to generate anatomically consistent variability, followed by label-to-image synthesis as described in the subsequent sections.
A label-to-image synthesis module then converted the perturbed label maps into realistic MRI volumes paired with binary masks. Because this procedure was executed on-the-fly during training, the network was continually exposed to novel examples, ensuring robustness and reducing overfitting risk, while providing effectively unlimited synthetic training samples.
Spatial transformation on label maps
To improve robustness against anatomical variability and acquisition differences, synthetic label maps were perturbed with random affine and non-linear transformations before conversion to images. Transformations are explicitly constrained to avoid topological violations, including brain fragmentation, extrusion beyond the skull boundary, and inversion of foreground–background relationships. Affine perturbations included rotations uniformly sampled from ±180°, translations up to 6 mm, and anisotropic scaling up to ±
Parameters for Synthetic Data Augmentation
For the combined model, spatial transformations included random rotations (±180°), shifts (≤6 mm), scaling (±0.1), and warping (0.2), along with blur (0.1), additive Gaussian noise (0.0–0.2), and cropping (probability = 1.0). Although full 180° rotations are unlikely in practice, this augmentation was included to promote rotational invariance and prevent orientation bias, similar to prior synthetic segmentation frameworks. In practice, the most effective rotations were within ±30–45°. Foreground and background components were perturbed with lower-intensity deformations to preserve tissue boundaries while enhancing variability.
Intensity augmentations of images derived from label maps
To simulate scanner-specific intensity non-uniformities and noise, voxel intensities were perturbed by additive Gaussian noise and multiplicative bias fields. These effects approximate coil inhomogeneity, B1 bias, and random fluctuations in scanner gain that can vary across sites or acquisitions:
Resolution variability
To make the segmentation robust to heterogeneous acquisition protocols, synthetic images were intentionally degraded to simulate variability in slice spacing and thickness. This step mimics real-world differences across scanners and studies, where porcine MRI is often acquired at variable resolutions and anisotropic voxels. After randomly selecting an imaging axis (axial, coronal, sagittal), the synthetic volume was blurred with a Gaussian kernel to emulate slice thickness, then downsampled by trilinear interpolation to a randomly chosen spacing:
Model training
Model training was seeded with a curated set of 19 manually annotated T1- and T2-weighted volumes from 7 unique pigs, including both healthy and injured conditions across multiple time points. Expert annotations were used to delineate brain foreground from extracranial background, which then served as input to our generative pipeline. Within this pipeline, tissue regions were decomposed into multimodal Gaussian mixture clusters, recombined into label maps, and perturbed through spatial and intensity transformations (including rotation, scaling, warping, noise, and blur). A labels-to-image synthesis module produced paired synthetic MR images and binary masks on the fly, producing effectively unlimited exemplars for training.
Training employed the Adam optimizer with an initial learning rate of
Both U-Nets were trained from scratch using synthetic data. Synthetic samples do not preserve voxel-wise correspondence to any original scan and are regenerated stochastically at each training iteration. Soft Dice loss was used to optimize segmentation overlap. Training was validated on held-out synthetic samples with early stopping to prevent overfitting. The pipeline was implemented in TensorFlow and trained on a single NVIDIA A100 GPU.
Evaluation metrics for segmentation performance
Segmentation accuracy was quantified using the Dice similarity:
For evaluation, model predictions were quantitatively compared against expert manual annotations to assess segmentation accuracy. Both experts independently delineated the brain masks, and their annotations served as ground truth references. PIGSKIN’s performance was benchmarked against these expert annotations as well as against several existing skull-stripping algorithms, enabling direct comparison across conventional and synthetic-based methods. To benchmark performance, PIGSKIN was evaluated alongside existing skull-stripping methods including BET, HD-BET, SynthStrip, and UniverSeg, using identical preprocessing and evaluation protocols. These baselines were selected to represent the current spectrum of automated brain extraction tools. BET (and its deep learning variant HD-BET) are classical and widely adopted pipelines optimized for human MRI, providing a conventional reference. SynthStrip represents the newest state-of-the-art model trained on extensive synthetic human data, testing cross-species generalization. UniverSeg offers a universal few-shot segmentation framework that can adapt to unseen domains, making it an appropriate benchmark for assessing transferability. Finally, Original+ (baseline trained on real images with standard augmentation) serves as an internal control trained directly on the limited real pig data with standard augmentations, isolating the contribution of our synthetic data generator. Together, these methods span the range from traditional to universal and synthetic-based algorithms, providing a fair and comprehensive comparison for PIGSKIN. In addition to whole-brain Dice scores, region-specific analyses were performed focusing on the olfactory bulbs and ventral brain, areas that are anatomically complex and often fail in conventional skull-stripping pipelines. These regions were defined a priori to assess model robustness in anatomically challenging zones, and qualitative and quantitative comparisons are reported in the “Results” section.
Test-time inference and resolution effects
All test scans were obtained from pigs that were not used to seed the synthetic training generator, ensuring subject-level separation between training and evaluation. At test time, we evaluated two inference strategies. First, following the SynthSeg pipeline, 20 input volumes were resampled to 1 mm isotropic resolution and normalized via min–max scaling using the 1st and 99th percentiles of intensities. The images were then cropped into a standardized 192³ voxel cube, and the U-Net produced the segmentation on the resampled volume.
Second, to evaluate the effect of downsampling-induced resolution loss—which occurs when images are interpolated from their native acquisition space to a coarser 1 mm grid—inference was performed directly on the native-resolution data (0.54 mm isotropic for the CCI cohort and 0.70 mm isotropic for the rotational-acceleration cohort). This one-step design preserved fine-scale anatomical details, such as the olfactory bulbs and ventral brain regions, which were otherwise prone to under-segmentation when evaluated exclusively at 1 mm resolution. Unless otherwise specified, all results reported in this study were obtained using native-resolution inference. Quantitative segmentation performance across training and testing resolutions is summarized in Table 3.
Impact of Resampling on PIGSKIN Performance (Dice ± standard deviation).
SD, standard deviation.
The bold value indicates the best segmentation performance.
For both inference modes, softmax probability maps were thresholded, and the largest connected component was retained to suppress spurious islands. The average runtime was approximately 12 s per scan for 1 mm inference and 18 s per scan for native-resolution inference, including resampling, network evaluation, and post-processing.
Results
Benchmarking against existing methods
Across all test sets, PIGSKIN achieved performance comparable to expert consistency, with Dice scores approaching those of human annotators when evaluated against Expert 1 (≈0.97; Fig. 4). Although some disagreement persists with Expert 2 at anatomically ambiguous regions—particularly around the olfactory bulbs, which form irregular, highly convoluted structures—overall inter-rater consistency remains high. Accordingly, inter-rater Dice largely reflects localized boundary uncertainty rather than systematic disagreement. Trained on spatially consistent synthetic variations, the model produces smoother and more internally consistent boundaries, which in turn leads to higher Dice scores relative to either expert when considered individually. Performance remained stable across cohorts and MRI modalities, demonstrating strong generalization. A U-Net trained solely on the original, non-synthetic images, including both healthy and injured brains with standard augmentations, failed to reach comparable accuracy, underscoring the importance of our synthetic data generator. Importantly, PIGSKIN’s performance approached the reference level of inter-rater variability (Expert 1 vs. Expert 2: 0.958 ± 0.006), effectively matching expert inter-rater consistency (Fig. 4, Table 1). Notably, while Expert 2 annotations exhibited greater variability, PIGSKIN maintained stable predictions, suggesting that its apparent “errors” often reflect ambiguities in manual ground truth rather than algorithmic limitations. Together, these results establish PIGSKIN as the first porcine brain extraction framework to achieve human-expert performance, clearly surpassing existing automated methods. When the same model was trained and tested on 1 mm resampled images, accuracy dropped sharply (Dice = 0.73 ± 0.18). Visual inspection revealed systematic false negatives in anatomically complex regions. In several cases, one olfactory bulb was entirely missing from the predicted mask, highlighting the loss of critical detail when inference is performed at lower resolution.

Comparison of segmentation performance across models using Dice score. Dice values are computed independently with respect to each expert annotation, with solid bars denoting evaluation against Expert 1 and hatched bars denoting evaluation against Expert 2. The proposed GMM-based PIGSKIN model demonstrates substantially improved performance relative to baseline methods (UniverSeg, BET, and Original+), achieving expert-level agreement with both annotators. The “E1/E2” bar represents inter-rater agreement between Expert 1 and Expert 2 and serves as a reference measure of annotation variability. Original+ refers to training on the original images with additional random augmentation (rotation and noise). Error bars indicate ±standard deviation.
When trained and tested at native resolution, the one-step model achieved the highest Dice accuracy (0.968 ± 0.007). Evaluating the same network on 1 mm resampled images slightly reduced performance (Dice = 0.962 ± 0.006), with visible loss of fine structures such as the olfactory bulbs. Training and testing entirely on 1 mm resampled inputs further degraded performance (Dice = 0.916 ± 0.011).
PIGSKIN demonstrated highly consistent performance across T1- and T2-weighted test sets (Dice ≈ 0.968–0.970, Fig. 5), indicating strong generalization across MRI contrasts (Fig. 6). When trained jointly on T1 and T2 volumes, the model exhibited slightly improved stability, supporting a modality-agnostic framework that performs reliably irrespective of input sequence. This capability is particularly valuable in preclinical neuroimaging, where acquisition parameters vary across scanners, sites, and injury paradigms. Unlike existing pipelines constrained to a single contrast, PIGSKIN leverages synthetic multimodal clustering during training to achieve a unified segmentation solution that robustly accommodates both T1- and T2-weighted inputs.

Example test case showing PIGSKIN segmentation results on T1- and T2-weighted MRI from the same pig. Red overlays represent the segmented region obtained using PIGSKIN, demonstrating consistent performance across both modalities.

Performance of GMM-based models trained on T1-only, T2-only, or combined T1 + T2 data, evaluated on separate T1 and T2 test sets. All models achieve high Dice scores across both modalities, with slightly improved consistency when trained with both T1 and T2. This demonstrates the generalization capability of the GMM framework across MRI contrasts.
Effect of injury inclusion on model generalization
Models trained on both pre- and post-injury data significantly outperformed those trained only on pre-injury cases. Dice scores improved from 0.91 to 0.96 in rotational datasets and from 0.95 to 0.97 in CCI datasets when tested post-injury. Exposure to pathological variability enhanced generalization across both healthy and injured animals (Fig. 7).
PIGSKIN reliably segmented anatomically complex regions such as the olfactory bulbs and ventral brain, even under injury-induced disruption. Accurate separation was achieved in both healthy and injured conditions, outperforming conventional methods that often failed in these regions (Fig. 8).

Comparison of Dice performance between models trained only on pre-injury data (dark bars) and models trained on both pre- and post-injury data (light bars). Results are shown separately for rotational and CCI datasets under both pre-injury (left) and post-injury (right) test conditions. In all cases, the combined model achieved significantly higher Dice scores than the pre-injury model (*p < 0.05, **p < 0.01, ***p < 0.001), demonstrating that including injury data during training improves generalization across both healthy and injured animals. CCI, controlled cortical impact.

Examples of segmentation results across different injury models and datasets. The first row shows pre-CCI injury with clear separation of olfactory bulbs from ventral brain tissue (axial view, box 1). The second row highlights post-CCI injury cases, where focal lesions are visible in the coronal (box 2) and sagittal (box 3) views. The third and fourth rows display pre- and post-diffuse injury scans, respectively, with boxes 4 and 5 illustrating the olfactory region, where boundaries are irregular and complex.
Overall, qualitative inspection confirmed accurate delineation of the brain boundary, including challenging regions such as the olfactory bulbs and ventral surface (Fig. 5). PIGSKIN preserved fine-scale anatomical detail without including extracranial tissues.
Discussion
Here, we introduce PIGSKIN, a synthetic data–driven framework for porcine brain extraction. PIGSKIN achieved Dice scores comparable to inter-rater reliability, outperforming conventional human brain segmentation pipelines such as BET and UniverSeg,9,21 which are designed and trained exclusively on human MRI and do not generalize well when applied to porcine anatomy. This limitation likely arises because these tools assume smooth cortical boundaries, limited extracranial tissue, and consistent head shape, assumptions that are violated in pigs due to thick subcutaneous fat, elongated snout, and ventral brain extension into the olfactory and cerebellar regions.
Inter-rater agreement provides a benchmark for parity with the agreement observed between two trained raters (Dice ≈ 0.958 ± 0.006). PIGSKIN’s performance (Dice ≈ 0.97) approaching or slightly exceeding the variability inherent to human annotation, indicating expert-level consistency comparable to manual delineation. Residual discrepancies between experts were localized to near the ventral brainstem and olfactory bulbs, regions where boundaries are diffuse or anatomically irregular. PIGSKIN preserved these structures with high fidelity, reducing false negatives and improving agreement (Fig. 4).
Anatomically, segmentation of the olfactory bulbs and ventral brain has remained challenging due to their irregular geometry, contact extracranial fat, and vary in intensity due to air–tissue interfaces. Conventional human skull-stripping pipelines frequently truncate or merge these areas because their priors enforce compact, convex masks. By contrast, PIGSKIN’s synthetic multimodal augmentation systematically exposed the network to such anatomical variability, elongated bulbs, ventral flattening, and heterogeneous soft-tissue boundaries, enabling learning of invariant spatial representations. Accordingly, as shown in Figure 8, the model maintained consistent delineation across both healthy and injured animals, exceeding the performance of conventional methods in these anatomically complex regions.
These failures likely reflect their priors that enforce compact, convex brain masks and expect minimal extracranial intensity overlap. In pigs, extracranial fat, muscle, and air pockets near the sinuses break these assumptions, resulting in under-segmentation and soft-tissue leakage into soft tissue. PIGSKIN addresses this by decoupling intracranial and extracranial intensity modeling during synthetic data generation. Foreground and background are clustered independently and recombined under randomized spatial and intensity perturbations, generating anatomically plausible and diverse training samples (Fig. 3). This augmentation strategy exposes the model to broader porcine variability, reducing overfitting risk to a single anatomy and supporting robust generalization across scanners and injury paradigms.
Resolution differences have also influenced segmentation fidelity. While prior work often downsampled to standardized resolutions for convenience, porcine brains contain small high-contrast structures easily degraded by interpolation. We observed that maintaining native spatial resolution preserved anatomical fidelity beyond quantitative gains. The olfactory bulbs and ventral brain, key sites of deformation in TBI, are especially sensitive to partial-volume effects. Accordingly, training at native resolution supports quantitative accuracy for downstream volumetric and morphometric analyses while still generalizing well to coarser datasets (Supplementary Table S1).
Collectively, these design choices, synthetic multimodal augmentation, independent tissue modeling, and preservation of native spatial detail, enabled generalization across breeds (Yorkshire, Landrace), scanners, and injury types without large annotated datasets. Beyond this application, this framework suggests a strategy for data-limited domains: model anatomical variability through controlled synthesis rather than manual labeling.
In summary, PIGSKIN provides reproducible, expert-consistent brain extraction with minimal supervision and provides a framework for anatomically grounded synthetic training in preclinical and translational neuroimaging.
Limitations and future work
Several limitations should be acknowledged. The evaluation cohort was modest compared with human benchmarks, which may limit generalizability and prevent stratification by injury severity or site. Synthetic augmentation, while powerful, was seeded from up to seven pigs, potentially constraining anatomical diversity. Training data were largely derived from CCI animals, raising the possibility of model bias toward this injury model.22,23 Broader validation across breeds, injury types, and sites will be essential.
Methodologically, augmentation was largely slice-based, which may limit 3D spatial continuity. Future work should incorporate volumetric generative models, hybrid CNN–transformer architectures, and self-supervised pretraining. Finally, expert annotations are subject to variability, especially in injured regions. Consensus- or probabilistic-based ground truth would strengthen evaluation frameworks.
Conclusion
PIGSKIN delivers the first automated, comparable-to-expert consistency solution for porcine brain extraction. By uniting synthetic data generation with a deep learning framework, it achieves robust performance across modalities, injury models, and datasets. Beyond filling a critical gap in preclinical neuroimaging, PIGSKIN illustrates how synthetic training can enable scalable and reproducible pipelines in resource-limited domains. This tool lays the groundwork for quantitative porcine TBI research and strengthens the translational bridge to human neuroimaging. To accelerate adoption, we have released the full code and trained models publicly at https://github.com/dadashkarimi/PIGSKIN.
Ethical Statement
All animal procedures were performed in accordance with institutional guidelines and were approved by the Institutional Animal Care and Use Committee at the University of Pennsylvania. All experiments complied with relevant ethical regulations for animal research.
Data Availability Statement
The trained model weights and source code for the PIGSKIN framework are publicly available at https://github.com/dadashkarimi/PIGSKIN. The imaging data used in this study are not publicly available due to ethical and institutional restrictions related to animal research but may be made available from the corresponding author upon reasonable request and subject to institutional approval.
Authors’ Contributions
J.D. conceived and designed the study, developed the methodology, implemented the algorithms, performed all experiments, and drafted the article. D.P. and N.B.L. contributed to data annotation and validation. R.D.-A. made significant contributions to the interpretation of results and to the discussion of findings. D.H.S. provided an additional cohort of animals subjected to rotational injury. V.J. contributed to initial data acquisition and preprocessing. J.W. oversaw the porcine studies and provided the animals used in this work. R.V. conceived and designed the study, supervised the project, and critically reviewed the article. All authors reviewed and approved the final article.
Footnotes
Acknowledgment
We acknowledge Sai Krishna C. Annavazala for creating and maintaining the containerized deployment environment for PIGSKIN.
Author Disclosure Statement
The authors declare that there are no commercial or financial relationships that could be construed as a potential conflict of interest.
Funding Information
This work was supported by the National Institutes of Health (NIH) under grants R01NS123034 and R01CA278819 and by the U.S. Department of Defense (DoD) under awards W81XWH20-1-0838, W81XWH-20-1-0901, and HT9425-23-1-1039. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or the DoD.
