Abstract
Three-dimensional prenatal ultrasound scans of a baby’s facial features have become increasingly popular among parents in both private and clinical settings. Ultrasound practitioners often draw on their experience to identify factors that influence image quality when discussing scan outcomes with parents. This study aims to identify the maternal, fetal and technical factors that may affect the quality of three-dimensional souvenir face images during ultrasound scan. A retrospective quality review study was performed with data from a single-centre research study, including ultrasound videos of the fetal growth scans, three-dimensional facial ultrasound acquisitions and post-processing steps. A total of 342 three-dimensional surface-rendered images were attempted from 41 singleton pregnancy subjects, average gestational age 26.69 weeks, range: 21–30. The retrospective image quality for all images was assessed by two observers. Univariable ordinal regression test was used to investigate the associations between demographic/technical factors and the best image quality acheiveable. Of the 41 pregnancies, three-dimensional acquisition time was an average of 03:07 (mm:ss), (range: 01:22–5:31). In total, 49% of women had at least one good or moderate quality image, and 51% women had a poor quality or failed three-dimensional scan as the best quality possible. Image quality was associated with placenta site, explaining 18% of the variation (p < 0.05). We found a maternal–fetal factor which has a high impact on three-dimensional image quality of the prenatal face but nonetheless, sonographer skill, training and other technical factors may be employed to minimise the impact of detrimental factors.
Background
It is a modern-day trend for expectant parents to purchase three-dimensional (3D) scan packages from commercial screening companies, with the primary aim not being for medical purposes. 1 Ultrasound (US) scans are appealing to expectant parents, who see them as the ‘confirmation of new life’. 1 Consequently, these scans are, therefore, often described as ‘keepsake baby ultrasound’, ‘boutique baby imaging’ and ‘bonding scans’, which emphasise the experiential and emotional aspects of the scans rather than their clinical purpose. These labels highlight the role of such scans in fostering parental connection and enhancing the sensory and emotional experiences of expectant parents.
When women choose to have these scans, there is often an overall positive experience; however, there can be a considerable amount of disappointment with the quality of images they receive. 2 In a recent interview study, the authors reported that only one of the six women discussing commercial scans were happy with the quality of the images they received. 2 Others expressed disappointment at the inability to get any photographs, even after repeated visits to the scan room, or they reported problems with the fetal position or advanced gestational age (GA) limiting the image quality. Suboptimal 3D/four dimensional (4D) images could lead to disappointment after such grand expectations especially as a considerable amount of money is often paid personally, by parents, for these scan services. 3 Parents may perceive limited transparency from scan service providers about the likelihood of achieving a high-quality 3D scan, and service providers may have limited data on which to base predictions about a successful and complete examination, instead warning parents of any mitigations that may be required, for example a rescan at a later date.
Clinical adoption of 3D and 4D US technologies has been slow, due to training needs, additional time required for 3D acquisition and post-processing, perceptions of limited clinical added value and the higher capital costs required for new equipment.
4
The main technical barriers to achieving good-quality 3D US images include acoustic shadowing from highly reflective structures
Despite the importance of 3D clinical diagnosis, image quality techniques
Methods
Sample
Retrospective data sets were obtained from an ethically approved study
Of the 122 participants initially identified, application of the predefined exclusion criteria resulted in a final sample of 41 participants; the high exclusion rate reflects enrichment for structural anomalies in the original research cohort compared with routine clinical practice.
US images of the fetal anatomy were acquired in a live 4D mode with additional 3D volume acquisitions obtained for research purposes. In addition, as part of the imaging protocol, all participants were offered a 3D souvenir image of their baby’s face. This study focuses solely on the pseudonymised facial 3D image data that includes documented attempts to obtain good-quality 3D views as well as the images that would have been included in the final selection offered to parents.
Data sets
All scans were performed using a Philips EPIQ US scanner (Best Campus, Netherlands) equipped with a xMatrix, PureWave crystal technology X6-1 matrix transducer (frequency range of 6–1 MHz) and/or a broadband V6-2 curved volume array transducer (frequency range of 6–2 MHz).
Each participant’s data set included a full US examination video (MP4 format), dedicated 3D images saved by the US operator (.PNG format), and basic maternal and fetal demographic data collected in a REDCap research database. 10 Variables collected from this data included GA, in completed weeks), maternal age, maternal body mass index (BMI), maternal ethnicity, placental location, fetal position, and the presence or absence of fibroids.
The BMI was stratified into three categories based on their pre-pregnancy records into normal weight, overweight and obese women, according to the World Health Organization (WHO) Adult BMI Classification (normal weight BMI: 18.5–24.9 kg/m2, overweight is BMI: 25.0–29.9 kg/m2, and obesity is classed with a BMI over 30 kg/m2). 11
Additional information was gathered directly from the retrospective data set and recorded, to include
The start time of the fetal profile imaging, capture time of the 3D acquisitions, end time of the 3D post-processing and total scan time were recorded.
The presence of a fetal part in front of the face was noted, namely, limbs, cord, or placenta.
Measurement of the fluid level from the fetal face to uterine or placental surface, which, for the purpose of this study, will be called uterine-fetal amniotic assessment (UFAA), was recorded and subjectively scored as either good, moderate, or poor. (see criteria in Figure 1).

Uterine Amniotic Fluid Assessment (UFAA) scoring system.
Technical factors were also recorded for each scan including
whether the 3D render line was straight or curved;
whether the region of interest (ROI) was appropriate (i.e. included the facial profile) or inappropriate (i.e. other anatomical areas which were not relevant to this study);
if the acquisition plane was midsagittal, oblique or another plane;
if anatomical reference points for the centre of the ROI were set at the forehead, nose, or chin level; and finally;
the volume acquisition rate, defined as the frequency at which the image was acquired (low, medium, or high).
Imaging assessment
An objective image quality scoring system was agreed upon by consensus between two operators (with 13 and 7 years of obstetric 3D US experience, respectively). These two operators then independently reviewed all images and scored them blindly using the predefined quantitative scoring system. Scores were recorded independently and without discussion.
This scoring system included a predefined criterion on how to assess the images see Table 1 for scoring criteria and Figure 2 for image examples.
Image quality scoring criteria.

Scoring system examples for 3D ultrasound image assessment.
Statistical analysis
Data were collected and prepared in Microsoft Excel (16.0.14334.20468). Analyses were conducted in IBM SPSS Statistics for macOS (Version 29). Data were summarised using counts and percentages for categorical variables and means and ranges for continuous variables as appropriate.
Associations between best image quality scores (from observer 1) and maternal–fetal factors were explored using a univariate ordinal regression, with results reported as odds ratios (OR) and 95% confidence intervals. Predictors were entered according to their measurement scale: ordinal (amniotic fluid levels, BMI, GA groups in weeks, fibroids), nominal (fetal parts obscuring face, placenta site, fetal position), or continuous (acquisition time in minutes). The primary outcome, best image quality for each subject, was treated as an ordinal dependent variable (collapsed into three categories, before analysis to ensure robustness by supporting stable parameter estimation). Because of the fixed sample size and the potential for sparse cell counts across several categorical predictors, categories were collapsed where clinically appropriate before modelling. In addition, the effect size was calculated to understand how much any statistically significant factor predicts the resultant highest image quality. The proportional odds assumption for the ordinal regression was assessed for violation using the Test of Parallel lines (where the assumption was not met if, p < 0.05).
Inter-observer agreement for the 3D image scoring (per image) was assessed using Cohen’s kappa coefficient and calculated based on the paired independent scores before any reconciliation. For all statistical tests, a p value of <0.05 was considered statistically significant.
Results
This study population comprised 41 singleton pregnancies, yielding a total of 342 3D fetal facial images. Of these 342 facial views, 314 remained in the sample after exclusions, because 28 images were not of the fetal face and it was evident that, at the time of acquisition, the sonographer was not focusing on obtaining a facial view but rather imaging the feet, the back of the head or the whole fetus.
Of the 41 singleton subjects included in the study, 20 participants (49%) were primigravida, 8 (20%) had one previous pregnancy, and 3 (7%) had two or more previous pregnancies; pregnancy history was unknown in 10 participants (24% of cases) due to missing background data. With respect to BMI, 19 participants (46%) were within the normal range, 13 participants (32%) were overweight, and eight participants (20%) were obese. There were no underweight participants. The mean maternal age was 34.5 years, range: 21–44, and the mean GA was 26 weeks 1 day, range: 21–30. The mean 3D acquisition time was 3 minutes 7 seconds (range: 1:22–5:31, minutes:seconds). The number of attempts required to acquire fetal facial views ranged from 1 to 39, with a mean of 8.2 attempts. The distribution of these maternal–fetal characteristics stratified by image quality is seen in Appendix 1 and Table 2 illustrates how the categories were collapsed for univariate ordinal regression statistical testing purposes.
A. Maternal–fetal and technical characteristics table.
B. Maternofetal characteristics per subject with collapsed groupings for analysis, n = 41.
Overall, 20 women (49%) achieved at least one image rated as good or moderate within their acquired image set. Consequently, 21 women (51%) had poor or fail as their highest image quality score. This indicates that in most cases, scans did not yield an image of good quality in any given scan setting.
At least one image of good quality was obtained in five of 41 singleton pregnancies (12%) (Figure 3). Fifteen women (37%) had at least one image rated as moderate, 15 (37%) had an image rated as poor, and 6 (14%) had an image rated as fail.

(a) Bar charts of proportions for: Sonographer Image Score results from all 314 3D obstetric facial views. (b) Bar charts of proportions for: Highest image quality score result on a per subject basis (n = 41).
When assessing image quality, of the 314 individual images analysed, 11 (4%) were scored as good, 55 (18%) had a score of moderate, 162 (52%) were scored as poor, and 86 (27%) were rated as fail, reflecting attempts rather than the final selected image (Figure 3).
Across 314 3D image acquisition attempts, technical characteristics showed notable variation (see Table 3 and Appendix 1). Render lines were used in similar proportions, with 53% straight and 47% curved. However, straight render lines appeared slightly more favourable for image quality compared with curved lines, as 82% of the good images and 54% of moderate images had a straight render line and proportions of poor or failed scans were comparable for both.
Technical scan characteristics assessed on an image-by-image basis during evaluation.
The ROI was judged appropriate in 58% of images, whereas 42% were acquired with an ROI considered too large. Image quality was balanced across all categories among scans with ROIs that were either clearly appropriate or clearly too large.
The most frequently used anatomic reference plane was the nose (74%), compared with the forehead (17%) and chin (9%). The predominance of the nose as the anatomical reference makes it difficult to assess potential differences in image quality across reference planes.
Nearly, half of all images were acquired in a true midsagittal plane (49%), with the remainder obtained in transverse (35%) or other planes (15%).
For acquisition rate, images were obtained at slow, medium, or fast speeds, with the distribution as follows: slow: 0–20 Hz (11%), medium: 21–40 Hz (42%), and fast: >40 Hz (46%). Fast acquisition rates were most frequently used. Higher rates tended to correspond with improved image quality, 63% of good scans and 60% moderate scans had a fast acquisition rate. Low acquisition rates more often corresponded with poorer-quality scans.
A total of 314 images were assessed by the sonographers. Inter-observer comparison between sonographer 1 and sonographer 2 demonstrated disagreement in 36 of the 86 fail-rated images (42%). Of these 162 poor images, there was inter-observer disagreement in 38 images (23%); out of the 55 moderate images, there was inter-observer disagreement in 14 images (25%); and out of the 11 good images, there was inter-observer disagreement in seven images (64%) This suggested a moderate degree of agreement between the two sonographers’ scoring, Kappa = 0.516, p < 0.001.
In univariable ordinal logistic regression, placenta site was the only patient factor significantly associated with image quality and the proportional odds (parallel lines) assumption was not violated (χ² = 6.75, df = 1, p = 0.009). Images acquired with an anterior placenta had substantially lower odds of achieving higher image quality compared with non-anterior placentas (OR = 0.20, 95% CI: 0.05–0.70). This model explained approximately 18% of the variability in image quality (Nagelkerke R² = 0.18), consistent with a moderate effect. Other examined factors (amniotic fluid, BMI, GA, fetal position, fetal parts, fibroids, and acquisition time) were not significantly associated with image quality in univariable analyses. Given the fixed sample size (N = 41), the study was powered to detect moderate-to-large effects; therefore, smaller associations with these variables cannot be excluded. See table of ordinal regression associations in Appendix 2.
Discussion
This retrospective study examined the quality of 3D facial views obtained as part of an US research study, with the aim of exploring factors affecting 3D visualisation of the fetal face. The main finding was that anterior placentas were associated with lower image quality scores, and this was the only factor that could be demonstrated statistically. This does not mean that other maternal or fetal characteristics were not relevant in the ability to obtain a high-quality view because small effect sizes would have been unlikely to be detected due to small sample sizes and sparse data across the assessed category levels (see Appendix 2). Technical factors that could be adjusted by the operator also varied within the same subject and across image quality levels (see Appendix 1), suggesting operator decisions about how to optimise these factors, for example ROI and acquisition plane, depend on the circumstances of the scan. In addition, we found an overall 3D image success rate of 49% for good or moderate views as the highest quality achieved. It may be true that operator behavioural factors were at play, as in this setting, 3D pictures were optional and acquisition limited to the end of a research-focused scan. This may have resulted in a time limitation or less motivation to repeat attempts over an extended period if an adequate 2D souvenir picture was obtainable.
Factors which influence image quality in prenatal 3D facial US
Findings of this study did not support previous research, which found that GA is associated with image quality. In a study by Khoury et al., 7 US examinations at a later GA of approximately 24 weeks were found to increase the ability to visualise fetal morphology in both obese women and women of normal BMI. The authors also found that increased estimated fetal size was associated with improved visualisation. Similarly, a study by Kurjak et al. 12 proposed that the most favourable GA range for 3D scanning of the fetal face ranged from weeks 23 to 30. This GA range was also the range used for this study. In a study by Kurjak et al., 12 the fetal face was successfully visualised in a high proportion of cases during this period, without significantly extending the duration of the prenatal 2D US scan. In this study, the highest GA of 30 weeks was not associated with the greatest proportion of good and moderate image quality scores.
This study did not provide evidence of the association between maternal BMI and image quality contrasting with previous studies7,8 and may be explained by the smaller sample in this study with narrower variability in BMI ranges. Previous studies show a significant reduction in image quality and facial visualisation in the presence of uterine fibroids. In contrast, this study did not identify an adverse impact of fibroids on image quality, although this result is not unexpected, as the study included only a small number of cases with large or multiple fibroids (approximately 2% of participants), thus sparse data increasing the likelihood of being underpowered for the test.
An additional source of uncertainty is parity, as there were missing data of 24% in participants. Therefore, an inference regarding an association between parity and image quality could not be reliably drawn due to the potential of sampling bias, and thus, this variable was not further explored in the analysis.
Technical adaptations to improve quality
A systematic review by Rotten and Levaillant 13 acknowledges that 3D US analysis has similar limitations with 2D US. These include situations where the fetus is closely adjacent to the uterine wall or the placenta, or in the case of decreased amniotic fluid volume, in which case visualisation in both 2D and 3D modes may obscure anatomical structures. This study found no association between UFAA or fetal parts overlying face and reduced image quality. These findings did not support previous evidence that poor UFAA levels can negatively affect US image quality, particularly for detailed facial assessment. A study by De Jong-Pleij et al. 14 outlines that the profile view plays a key role in the examination of the fetal face. It is, therefore, important to obtain a correct midsagittal view. An evaluation of the fetal profile in an incorrect midsagittal plane can lead to diagnostic inaccuracies on 2D US. Also, with regard to anatomical reference point, to obtain the fetal profile, the nose is used as a landmark by adjusting the scanning plane and by rotating around the y- and z-axes of the fetus, in search of the true midsagittal plane. This supports the idea that the tip of the nose can indeed be used as an exact landmark when searching for the profile. As with this study, this study provided a challenge to compare image quality for the anatomical reference point and for acquisition plane. It contradicts other similar studies likely due to the study’s small sample size of 41 pregnant women; therefore, caution must be applied, as the findings might not be relevant due to potential bias from type 2 errors.
Merz and colleagues5,6 suggested that sufficient amniotic fluid and the absence of overlying structures are necessary conditions for optimal 3D image acquisition. The authors state that while there is evidence that 3D technology plays a pivotal role in advanced facial assessment, the quality of 3D images is heavily reliant on maternal–fetal factors. Consequently, it is paramount that technical adaptations and examiner skills are utilised to improve image quality thereby, increasing diagnostic accuracy for future routine implementation in clinical assessment.
Strength and weakness
The limitations of this study are its retrospective nature, the small sample of 41 cases, few cases involving uterine fibroids, and smaller BMI ranges in comparisons to earlier, larger studies. In addition, limited technical information, such as, if a patient was allowed time to walk to encourage a change in fetal position, or if the maternal position was actively adjusted during image acquisition was nor recorded. Furthermore, co-variation between maternal, fetal, and scan-related characteristics could not be assessed in a sample of this size.
Another limitation of this study is that the primary purpose of the scan was not to clinically assess the face in 3D but rather to provide a souvenir image. Consequently, the time spent on attempting to acquire the facial images cannot be considered a strong predictor of image quality. The facial images obtained for this study were taken as an additional, complimentary souvenir picture for parents and were not the primary focus of the original study. As a result, there was no standardised guidance regarding the specific amount of time which should be dedicated to acquiring these images.
Finally, despite the implementation of a detailed quality control process by incorporating a structured scoring system, only a moderate level of inter-observer agreement was achieved. This suggests a high degree of subjectivity in the assessment of 3D fetal facial image quality. Future research could address what level of information parents currently receive regarding 3D US examinations, particularly in relation to disclaimers about the variable success rates of 3D US imaging.
Recommendations for practice
These results did not entirely reinforce findings reported by previous research or what is anecdotally known about the challenges of 3D image acquisition in pregnancy. Even if all contributing factors are favourable at the time of scanning, any movements of the fetus are beyond the control of the sonographer and can lead to reduced image quality. Based on the findings of this study, the following recommendations can be proposed:
Parents should be informed of the limitations of fetal face visualisation and the potential challenges associated with placental site.
The findings can support sonographers in setting realistic expectations for image quality during clinical examinations and provide evidence-based reasoning that relates to suboptimal visualisation encountered in individual cases.
Conclusion
Multiple factors are related to achieving a high-quality 3D US images of the fetal face. We found that an anterior placenta site is significantly associated with poorer image quality; however, the literature also supports GA and BMI as important factors underscoring the importance between patient characteristics and sonographer expertise, technical skill, and persistence in maximising opportunities for optimal scan quality. Future research should focus on parental expectation management and examining more closely the communication aspects of the 3D examination, including development of co-produced disclaimer information for parents regarding the success rates of 3D US scans.
Footnotes
Appendix
Univariate ordinal regression associations with image quality.
| Predictor and categories ‘reference’ |
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
|
||||||
| Anterior versus ‘Nonanterior’ | 6.75 (1) | 0.009 | 0.825 | 0.2 (0.05–0.70) | 0.177 | Collapsed |
|
|
||||||
| Good Moderate Poor |
0.58 (2) | 0.75 | 0.423 | 0.016 | Collapsed | |
|
|
||||||
| Normal |
1.15 (2) | 0.56 | 0.766 | 0.030 | Sparse data | |
|
|
||||||
| 21–22 |
7.74 (4) | 0.101 | 0.231 | 0.201 | Collapsed. |
|
|
|
||||||
| 1 minute |
9.98 (5) | 0.076 | 0.003 | 0.252 | Assumption violation, exploratory only | |
|
|
||||||
| Cephalic versus ‘non-cephalic’ | 0.88 (1) | 0.347 | 0.516 | 0.025 | Collapsed and sparse | |
|
|
||||||
| None |
2.84 (2) | 0.241 | 0.995 | 0.078 | Collapsed | |
|
|
||||||
| None |
0.24 (1) | 0.622 | 0.919 | 0.007 | Collapsed | |
Odds ratios >1 indicate higher odds of better image quality. Univariate models were run separately for each predictor.
Acknowledgements
The author would like to that the staff involved in the iFIND project, to include clinicians, researchers, research delivery and image acquisition staff, specifically thank you to Dr Emily Skelton. Thanks to Dr Christina Menni for her additional research support during the initial MSc phase of this work. Finally, thank you to all the families involved in the research study.
Ethical contributions
All relevant ethical guidelines have been followed, and ethics committee approvals have been obtained. All maternal participants gave written informed consent for the use of data acquired under the Intelligent Fetal Imaging and Diagnosis (iFIND, REC 14/LO/1806), ethical approval given by the Health Research Authority (London -South East).
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Author contributions
Jacqueline Matthew, PhD – Conceptualisation, study design, statistical interpretation and editing. King’s College London,
Emma Chung, PhD – Supervision. King’s College London,
Robert Wright, PhD – Data Curation, Resource. King’s College London,
Mary A. Rutherford, PhD – Conceptualisation, Funding Acquisition. Guy’s and St Thomas’ NHS Foundation Trust,
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was fulfilled, in part, by AC’s MSc, which was self-funded, In addition, the work was supported by an NIHR Clinical Doctoral Research Fellowship awarded to Jacqueline Matthew [NIHR300555], the Welcome Trust and EPSRC IEH award [102431] for the iFIND project [WT 220160/Z/20/Z], the NIHR Clinical Research Facility (CRF) at Guy’s and St Thomas’ and by the National Institute for Health Research Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust and King’s College London. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
Data are available on request to the corresponding author.
Data sharing
Not applicable.
