Reproducibility of phantom-based quality assurance parameters in real-time ultrasound imaging

Abstract

Background

In a large radiological center, the ultrasound (US) quality assurance (QA) program involves several professionals. Although the operator and the parameters utilized can contribute to the results, the selected QA parameters should still reflect the quality of the US scanner, not the measuring process.

Purpose

To evaluate the reproducibility of recommended phantom-based US QA parameters in a realistic environment.

Material and Methods

Six sonographers measured six high-end US scanners with 20 transducers using a general purpose phantom. Every transducer was measured altogether seven times, using one frequency per transducer. The QA parameters studied were homogeneity, visualization depth, vertical and horizontal distance measurements, axial and lateral resolution, and the correct visibility of anechoic and high-contrast masses. The evaluation of the homogeneity was based on visual observations. Inter-observer interquartile ranges were computed for the grading of the masses. For the other QA parameters, the mean inter- and intra-observer coefficients of variation (CoV) were calculated. In addition, the symmetry of the reverberations when imaging air with a clean transducer was checked.

Results

The mean inter-observer CoVs were: visualization depth 11 ± 4%, vertical distance 1.7 ± 0.4%, horizontal distance 1.4 ± 0.6%, axial resolution 22 ± 7%, and lateral resolution 16 ± 8%. The mean intra-observer values were about half of these values with similar standard deviations. The visual evaluation of the homogeneity and the symmetry of the reverberations produced false-positive findings in 5% of the cases, but were found useful in detecting a defective transducer. The grading of the masses had mean interquartile ranges of 20–30% of the grading scale.

Conclusion

The inter-observer variability in measuring phantom-based QA parameters can be relatively high. This should be considered when implementing a phantom-based QA protocol and evaluating the results.

Keywords

Ultrasound QA/QC technical aspects

Most of the methods utilized for B-mode ultrasound (US) quality assurance (QA) are based on detecting the image quality using a phantom (1–11). Often, the images are analyzed visually and with manual measurements during the scanning session (1–5). To attain more objectivity, computer programs for automatic image analysis have also been developed (6–10). There are also controversial opinions on the detectable changes in image quality using parameters from phantom measurements in modern US scanners (5, 12). Besides phantom measurements, e.g. testing systems examining the functionality of every element of the transducer can be utilized (13, 14). While the new American College of Radiology (ACR) standard on monitoring the performance of real-time US equipment lists recommended image quality parameters to be checked regularly, it does not always specify what methodology should be utilized in gathering the results (11).

In our imaging center, 19 units perform radiological US examinations. Due to the wide spatial distribution of the units and the amount of equipment, several persons must be involved in the QA process. On the other hand, no extra human resources are available, thus all the measurements performed must be carefully considered. In ultrasound imaging, besides the quality of the scanner, the imaging results depend on the operator handling the transducer and on the imaging parameters utilized. Factors affecting the accuracy of phantom measurements include human error, image pixel size and resolution, caliper precision, velocity and distance calibration, and phantom-related errors, e.g. the propagation of ultrasound in the phantom (15). If the expected human error is prominent in the measuring process, the QA parameter has little value in the continuing performance assessment. The purpose of this work was to study how reliable the recommended QA parameters could be reproduced by several sonographers in a realistic setting. A general purpose phantom with manual analysis was utilized, since this is still the most straightforward approach for US QA, with existing standards and recommendations (1, 2). This study was a part of a larger US QA project and also linked with training of new sonographers.

Material and Methods

Six high-end US scanners with altogether 20 transducers located in three different radiological units were measured by six sonographers. The scanners had been purchased from three different vendors during 2004–2009. The scanners included linear very high-frequency transducers, linear high-frequency transducers, micro-convex or sector transducers and convex low-frequency transducers (Table 1). Every transducer was measured with one frequency, the lowest available (convex low-frequency transducers) or the highest (other types of transducers).

Table 1
Scanners, transducers, and frequencies used in this study

Manufacturer Model Scanners and transducers (n) Purchase year(s) Transducer model Frequency (MHz)

GE Logiq E9 1 2009 ML6-15-D 15.0

C1–5 2.0

LOGIQ 9 2 2004, 2006 M12L 12.0

9L 8.0

8C 8.0

4C 2.0

Philips iU22 1 2007 L12–5 Res* (5–12)

L9–3 Res* (3–9)

C8–5 Res* (5–5)

C5–1 Pen* (1–5)

Toshiba Aplio XG 1 2008 PLT-1204BX 14.0

PVT-375BT 1.9

Aplio 80 1 2004 PLT-1204AT 14.0

PLT-704AT 11.0

PST-65AT 8.5

PVT-375BT 1.9

In some transducers, the user can choose between three frequency ranges: penetration, general, and resolution. The total frequency ranges of these transducers, specified by the manufacturer, are given in parentheses

Res = resolution, Pen = penetration

Every sonographer measured five scanners once and one scanner twice. Every scanner was thus measured seven times, during no more than 10 days to ensure the same condition of the transducer and the scanner in every measurement. The phantom was a CIRS model 040 general purpose phantom with Zerdine^TM as the background material and with nylon filament targets (diameter of 0.1 mm) and anechoic and high-contrast masses (16). The attenuation in the measurements was 0.5 dB/(MHz cm).

The QA parameters studied are described in Table 2. The sonographers performed the measurements without any knowledge about the earlier QA results for this equipment. Thus the expected results were not known, except for the vertical and horizontal distance measurements.

Table 2
Measured QA parameters, their detailed descriptions, and analysis methods

Parameter Description Analysis

Air image Is the reverberation pattern symmetrical when imaging air with a clean transducer? Visual (yes/no)

Are there vertical peaks in the image intensity values reaching the surface of the transducer? Visual (yes/no)

Homogeneity Has the field of view horizontally homogenous image intensity values? Visual (yes/no)

Visualization depth How deep can the transducer see the speckle before the noise starts to dominate (with an adequate axial field of view for the transducer and the frequency, with the maximum output level and with the lateral focus set as deep as possible)? Visual (depth in mm)

Vertical distance calibration Measurement of 40 mm vertical distance (using the depth from 20 to 60 mm) Manual measurement (mm)

Horizontal distance Measurement of 30 mm horizontal distance (in the depth of 30 mm) Manual measurement (mm)

Measurement of 40 mm horizontal distance (in the depth of 100 mm) Manual measurement (mm)

Axial resolution 1. How small vertical separation of filaments can be detected (discrete values of 0.5, 1, 2, 3, 4, and 5 mm) in the depth of 20–25 mm? 1. Visual (discrete value of 0.5, 1, 2, 3, 4 or 5 mm)

2. Measurement of the vertical dimension of a filament in the depth of 25 mm 2. Manual measurement (mm)

Lateral resolution 1. How small horizontal separation of filaments can be detected (discrete values of 1, 2, 3, 4, and 5 mm) in the depth of 20, 60 and 100 mm? 1. Visual (discrete value of 1, 2, 3, 4 or 5 mm)

2. Measurement of the horizontal dimension of a filament in the depth of 20, 60 and 100 mm 2. Manual measurement (mm)

Contrast resolution 1. Evaluation of anechoic masses with the diameter of 2 mm (in the depth of 20 mm), 4 mm (40 mm), 6 mm (60 mm) and 8 mm (80 mm) 1. Visual evaluation of the roundness of the mass and the lack of fill-in, grading 2/1/0 (both properties / only one of them / none of them fulfilled)

2. Evaluation of high-contrast masses (+15 dB) with the diameter of 2 mm (in the depth of 20 mm), 4 mm (40 mm), 6 mm (60 mm) and 8 mm (80 mm) 2. Visual evaluation of the roundness of the mass, grading 1/0 (round/not round)

The QA protocol for every transducer was implemented and saved in the scanners. For the same transducer, always the same protocol was selected by the different sonographers. Between the different scanner models, exactly the same imaging parameter settings were not always possible to implement. The main principle was to include minimum processing of the signal meaning that the more sophisticated features, such as harmonic imaging, spatial/frequency compounding and the manufacturer's proprietary filtering methods were switched off. For the other parameters, the following guidelines were used: output power was set to the maximum level, time-gain-compensation (TGC) to achieve uniform signal across the field of view, and dynamic range to 60 dB. Individual gain settings were allowed to obtain the best possible visibility for each measurement. Rejection, edge enhancement, and frame averaging were set to the lowest level possible. A linear gray map was selected. Line density was set to the highest possible value. A single focus in the same depth as the structure studied in different measurements was utilized, and the overall imaging depth was selected to allow the best visibility of the structure. Whenever possible, the abdomen was selected as the body part to be imaged. The TI, MI, and frame rate values were noted down for each measurement. Also, an image from each measurement was saved to a picture archiving and communication system (PACS).

The sonographers had a one-day training including a lecture on quality control and a workshop to demonstrate how different scanner settings influence QA with a phantom. In addition, the measurement protocol was gone through using one of the scanners included in this study in groups consisting of two sonographers. Before the measurements started, the sonographers also had the chance to practice with the phantom and have feedback on the results.

If one of the sonographers found an abnormal sign in the air image or in the homogeneity image possibly referring to dead elements in the transducer, all the corresponding measurements by the other sonographers were also checked when analyzing the results. The number of cases interpreted as false-positives was counted.

To estimate the human error in measuring the visualization depth, vertical and horizontal distances, and axial and lateral resolution, the coefficient of variation (CoV) (15) was computed for each transducer, including the results from all sonographers (inter-observer CoV) or results from the same sonographer (intra-observer CoV). To obtain a single inter- and intra-observer estimate for every QA parameter, the mean inter- and intra-observer CoVs were computed, including all the transducers.

For the grades of the anechoic and high-contrast masses (Table 2), the inter-observer interquartile range (17) was computed for every transducer. Also, the mean inter-observer interquartile ranges were determined including all transducers.

The distance and resolution measurements and the evaluation of the masses were performed only in the range of the measured visualization depth of the individual transducer with the chosen frequency, although the high-contrast targets may have been visible deeper. Thus, the very high-frequency linear transducers were excluded from the lateral resolution measurements in the depth of 60 mm. Also, the deepest lateral resolution measurements as well as the deepest horizontal distance measurements were only performed with the low-frequency convex transducers. In the distance and resolution measurements, zooming was allowed, as it should not have a significant effect on the results (3).

Results

Doubtful non-symmetry or inhomogeneity in the air image or in the homogeneity image, interpreted as a false-positive, was reported in 5% of the images. With one linear transducer, in six out of the seven measurements, severe non-symmetry of the reverberations in the air image was noticed. Although not known by the sonographers, the transducer in question had 22 dead elements in one corner of the transducer, detected earlier using a FirstCall Aperio^TM transducer measurement system (Sonora Medical Systems Inc., Longmont, CO, USA).

The mean inter- and intra-observer CoVs for the visualization depth, distance, and resolution measurements are presented in Table 3. The mean inter-observer interquartile ranges for the anechoic and high-contrast masses were 0.4 ± 0.3 and 0.3 ± 0.2, respectively.

Table 3
Mean inter- and intra-observer coefficient of variations (CoV) with standard deviations (std) for the visualization depth, vertical and horizontal distance measurements, and axial and lateral resolution measurements

QA parameter Mean inter-observer CoV (%) ± std Mean intra-observer CoV (%) ± std

Visualization depth 11 ± 4 4 ± 6

Vertical distance calibration 1,7 ± 0,4 0,9 ± 0,6

Horizontal distance, both depths 1,4 ± 0,6 0,8 ± 0,9

Axial resolution, separation of filaments 23 ± 19 6 ± 17

Axial resolution, filament dimension 22 ± 7 12 ± 11

Lateral resolution, separation of filaments*

Depth: 20 mm 16 ± 16 3 ± 13

Depth: 60 mm 17 ± 11 18 ± 18

Depth: 100 mm 9 ± 10 7 ± 14

All depths 16 ± 14 9 ± 16

Lateral resolution, filament dimension

Depth: 20 mm 19 ± 8 12 ± 10

Depth: 60 mm 14 ± 7 11 ± 3

Depth: 100 mm 12 ± 7 9 ± 7

All depths 16 ± 8 11 ± 8

In measuring the visualization depth, one result for a high-frequency linear transducer was excluded from the results. The result in question was only 63% of the mean of the other corresponding results for this transducer. The lower TI and MI values in this measurement suggested that probably the output power was accidentally set too low. The low-frequency convex transducers were excluded from the visualization depth results, since the bottom of the phantom at the depth of 180 mm could be seen with every low-frequency convex transducer using about 2 MHz frequency.

Due to misunderstanding, the axial and lateral resolution results by measuring the dimensions of a filament were only available from four sonographers. Thus, the number of these measurements in estimating the inter-observer precision was only four or five per transducer, and the intra-observer estimates for the resolution measurements were lacking for the transducers of two scanners (GE Logiq 9 and Toshiba Aplio XG).

Discussion

The purpose of this work was to evaluate the reproducibility of phantom-based QA parameters in a realistic setting. In a large radiological center, the QA must be performed by several professionals – inevitably some more experienced than others. In this work, six sonographers measured typical recommended phantom-based US QA parameters using six scanners with altogether 20 transducers. Every transducer was measured seven times.

The evaluation of the air image and of the homogeneity image produced false-positive findings in 5% of these images altogether. The one known defective transducer in this study (22 consecutive missing elements in one corner) was detected in six of seven measurements in the air image. However, the curved edge on the other side of the homogeneity image of this linear transducer, due to the missing elements, was not noticed by any of the sonographers when performing the measurements.

In general, it is not clear how small amount of missing elements can be detected in a phantom image in the first place, probably depending also on the transducer type (14), the total number of the elements and the aperture size. This was not an issue of this work.

The inter-observer precision for the visualization depth was low (the mean CoV of 11%). The images saved to PACS were all very similar for the same transducer, but the interpretation of the depth, in which the noise started to dominate the speckle, varied. The American Association of Physicists in Medicine (AAPM) and the American Institute of Ultrasound in Medicine (AIUM) recommend the defect level of the change in the visualization depth to be set as 10 mm (1, 2), when compared to the baseline value from the acceptance test. Typical visualization depths for the linear and micro-convex or sector transducers included in this study, with the imaging parameters utilized, varied between 40 to 110 mm. The CoV of 11% meant thus about 4–12 mm inter-observer standard deviations.

The vertical and horizontal distance measurements had mean inter-observer CoVs of 1.7% and 1.4%, respectively. The AAPM and AIUM recommend that the vertical distance error should not exceed 2% and the horizontal 3% (1, 2). In this study, the inter-observer measurement precision alone was close to the AIUM and AAPM vertical distance defect level.

The methods utilized for the resolution measurements produced relatively high inter-observer variations, CoVs of 9–23%. This could partly be due to the small mean values, varying between 0.4 and 2 mm, in computing the CoVs. On the other hand, the results were of the same order of magnitude as the estimated reproducibility of resolution measurements in Dudley et al. (5). With method 1, diverging results were more rarely seen than with method 2, but the differences between the diverging results were bigger. This was obviously due to the discrete results with method 1. With method 2, using a continuous scale, exactly the same results were obtained less frequently, but the differences between the diverging results were smaller. The recommended defect level for the axial resolution by the AAPM (1) and AIUM (2) is 1 mm or 2 mm (frequency < 4 MHz). The recommended defect level for lateral resolution depends on the focal length, frequency and aperture diameter (1).

The evaluation of the anechoic and high-contrast masses did not seem to give very useful information in this study, since the divergence between the results was high when compared to the scale of grading.

Automatic estimation of the QA parameters from the images could result in more repeatable analysis (6–10). For example, Gorny et al. (9) found the standard deviation of measuring the visualization depth to vary between 0.4–2 mm with their automatic analysis methods. In our study, a suitable properly tested analysis program was not available. Consistent scanning of the images by different operators would still be needed even if the images were automatically analyzed. Another possibility would be use of a less practical approach with a special transducer holder (9).

Different types of transducers – linear very high-frequency transducers, linear high-frequency transducers, micro-convex or sector transducers and convex low-frequency transducers – were included in the computation of the mean CoVs. The inter- and intra-observer variabilities may also depend on the transducer type. For example, the visualization depth results utilizing the micro-convex or sector transducers were clearly more variable than with the linear transducers. Also, the inter-observer precision among the convex transducers when measuring the lateral resolution (filament separation) near the phantom surface was worse than with the linear transducers, probably due to the more freedom in directing a convex transducer. Still, most of the results did not have clear differences that depended on the transducer type. The amount of transducers was too small for a more specific type-wise analysis.

There were some aspects in the measurements and in the interpretation of the results which should have been emphasized more in teaching phantom-based QA before the study. The sources of the problems faced were found efficiently thanks to the availability of the images and important measurement parameters afterwards. Valuable information and experience on the teaching and learning process itself and on creating an effective measurement protocol was obtained during the study. In general, working with a phantom was also found to be a valuable learning tool for the sonographers.

In conclusion, the inter-observer variability in measuring phantom-based QA parameters in a large imaging center can be relatively high. In this study with manual analysis of the QA images, the recommended defect levels for some of the QA parameters could be reached due to the inter-observer variability alone. The inter-observer variability should be carefully considered to avoid useless efforts in performing QA and wrong conclusions from the results.

Conflict of interest: None.

Manufacturer	Model	Scanners and transducers (n)	Purchase year(s)	Transducer model	Frequency (MHz)
GE	Logiq E9	1	2009	ML6-15-D	15.0
				C1–5	2.0
	LOGIQ 9	2	2004, 2006	M12L	12.0
				9L	8.0
				8C	8.0
				4C	2.0
Philips	iU22	1	2007	L12–5	Res* (5–12)
				L9–3	Res* (3–9)
				C8–5	Res* (5–5)
				C5–1	Pen* (1–5)
Toshiba	Aplio XG	1	2008	PLT-1204BX	14.0
				PVT-375BT	1.9
	Aplio 80	1	2004	PLT-1204AT	14.0
				PLT-704AT	11.0
				PST-65AT	8.5
				PVT-375BT	1.9

Parameter	Description	Analysis
Air image	Is the reverberation pattern symmetrical when imaging air with a clean transducer?	Visual (yes/no)
	Are there vertical peaks in the image intensity values reaching the surface of the transducer?	Visual (yes/no)
Homogeneity	Has the field of view horizontally homogenous image intensity values?	Visual (yes/no)
Visualization depth	How deep can the transducer see the speckle before the noise starts to dominate (with an adequate axial field of view for the transducer and the frequency, with the maximum output level and with the lateral focus set as deep as possible)?	Visual (depth in mm)
Vertical distance calibration	Measurement of 40 mm vertical distance (using the depth from 20 to 60 mm)	Manual measurement (mm)
Horizontal distance	Measurement of 30 mm horizontal distance (in the depth of 30 mm)	Manual measurement (mm)
	Measurement of 40 mm horizontal distance (in the depth of 100 mm)	Manual measurement (mm)
Axial resolution	1. How small vertical separation of filaments can be detected (discrete values of 0.5, 1, 2, 3, 4, and 5 mm) in the depth of 20–25 mm?	1. Visual (discrete value of 0.5, 1, 2, 3, 4 or 5 mm)
	2. Measurement of the vertical dimension of a filament in the depth of 25 mm	2. Manual measurement (mm)
Lateral resolution	1. How small horizontal separation of filaments can be detected (discrete values of 1, 2, 3, 4, and 5 mm) in the depth of 20, 60 and 100 mm?	1. Visual (discrete value of 1, 2, 3, 4 or 5 mm)
	2. Measurement of the horizontal dimension of a filament in the depth of 20, 60 and 100 mm	2. Manual measurement (mm)
Contrast resolution	1. Evaluation of anechoic masses with the diameter of 2 mm (in the depth of 20 mm), 4 mm (40 mm), 6 mm (60 mm) and 8 mm (80 mm)	1. Visual evaluation of the roundness of the mass and the lack of fill-in, grading 2/1/0 (both properties / only one of them / none of them fulfilled)
	2. Evaluation of high-contrast masses (+15 dB) with the diameter of 2 mm (in the depth of 20 mm), 4 mm (40 mm), 6 mm (60 mm) and 8 mm (80 mm)	2. Visual evaluation of the roundness of the mass, grading 1/0 (round/not round)

QA parameter	Mean inter-observer CoV (%) ± std	Mean intra-observer CoV (%) ± std
Visualization depth	11 ± 4	4 ± 6
Vertical distance calibration	1,7 ± 0,4	0,9 ± 0,6
Horizontal distance, both depths	1,4 ± 0,6	0,8 ± 0,9
Axial resolution, separation of filaments	23 ± 19	6 ± 17
Axial resolution, filament dimension	22 ± 7	12 ± 11
Lateral resolution, separation of filaments
Depth: 20 mm	16 ± 16	3 ± 13
Depth: 60 mm	17 ± 11	18 ± 18
Depth: 100 mm	9 ± 10	7 ± 14
All depths	16 ± 14	9 ± 16
Lateral resolution, filament dimension
Depth: 20 mm	19 ± 8	12 ± 10
Depth: 60 mm	14 ± 7	11 ± 3
Depth: 100 mm	12 ± 7	9 ± 7
All depths	16 ± 8	11 ± 8

References

Goodsitt

, Carson

, Witt

, Real-time B-mode ultrasound quality control test procedures. Report of AAPM Ultrasound Task Group No. 1. Med Phys 1998;25:1385–406

AIUM Technical Standards Committee. Quality control manual for gray-scale ultrasound scanners – stage 2. Laurel, MD: American Institute of Ultrasound in Medicine, 1995.

Kanal

, Kofler

, Groth

. Comparison of selected ultrasound performance tests with varying overall receiver gain and dynamic range, using conventional and magnified field of view. Med Phys 1998;25:642–7

Tradup

, Hangiandreou

, Taubel

. Comparison of ultrasound quality assurance phantom measurements from matched and mixed scanner-transducer combinations. J Appl Clin Med Phys 2003;4:239–47

Dudley

, Griffith

, Houldsworth

, A review of two alternative ultrasound quality assurance programmes. Eur J Ultrasound 2001;12:233–45

Gibson

, Dudley

, Griffith

. A computerised quality control testing system for B-mode ultrasound. Ultrasound Med Biol 2001;27:1697–711

Thjissen

, van Wijk

, Cuypers

MHM

. Performance testing of medical echo/Doppler equipment. Eur J Ultrasound 2002;15:151–64

Browne

, Watson

, Gibson

, Objective measurements of image quality. Ultrasound Med Biol 2004;30:229–37

Gorny

, Tradup

, Hangiandreou

. Implementation and validation of three automated methods for measuring ultrasound maximum depth of penetration: Application to ultrasound quality control. Med Phys 2005;32:2615–28

10.

Thijssen

, Weijers

, de Korte

. Objective performance testing and quality assurance of medical ultrasound equipment. Ultrasound Med Biol 2007;33:460–71

11.

ACR technical standard for diagnostic medical physics performance monitoring of real time ultrasound equipment. See http://www.acr.org/SecondaryMainMenuCategories/quality_safety/guidelines/med_phys/us_equipment.aspx

12.

Moore

, Gessert

, Schafer

. The Need for Evidence-Based Quality Assurance in the Modern Ultrasound Clinical Laboratory. Ultrasound 2005;13:158–63

13.

Weigang

, Moore

, Gessert

, The methods and effects of transducer degradation on image quality and the clinical efficacy of diagnostic sonography. J Diagnostic Med Sonography 2003;19:3–13

14.

Mårtensson

, Olsson

, Segall

, High incidence of defective ultrasound transducers in use in routine clinical practice. Eur J Echocardiogr 2009;10:389–94

15.

Dudley

. B-mode measurements. In: Hoskins

, Thrush

, Martin

, , eds. Diagnostic ultrasound: Physics and equipment. London: Greenwich Medical Media, 2003

16.

CIRS. General purpose multi-tissue ultrasound phantom, Model 040: User guide and technical information

17.

Milton

, Arnold

. Introduction to probability and statistics: principles and applications for engineering and the computing sciences. 3rd edn. Singapore: McGraw-Hill Book Co, 1995