Validation of a Skin-Lesion Image-Matching Algorithm Based on Computer Vision Technology

Abstract

Background: Melanoma incidence is increasing globally, but consistently accurate skin-lesion classification methods remain elusive. We developed a simple software system to classify potentially all types of skin lesions. In the current study, we evaluated the system's ability to identify melanomas with a diameter of 10 mm or larger. Materials and Methods: The skin-lesion classification system is composed of a proprietary database of nearly 12,000 diagnosed skin-lesion images and a computer algorithm based on the principles of content-based image retrieval. The algorithm compares characteristics of new skin-lesion images with images in the database to identify the nearest-match diagnosis. Results: Nearly all classification accuracy measures for this new system exceeded 90%, with results for sensitivity of 90.4% (95% confidence interval, 85.6–93.7%), specificity of 91.5% (85.4–95.2%), positive predictive value of 94.5% (90.4–96.9%), negative predictive value of 85.5% (78.7–90.4%), and overall classification accuracy of 90.8% (87.2–93.4%). Conclusions: The image-matching algorithm performed with high accuracy for the classification of larger melanomas. Furthermore, the system does not require a dermoscope or any other specialized hardware; any close-focusing camera will do. This system has the potential to be an inexpensive and accurate tool for the evaluation of skin lesions in ethnically and geographically diverse populations.

Introduction

Accurate, resource-conserving methods to classify skin lesions are in great demand given the rapidly increasing incidence of melanoma globally¹ and the challenges of accessing timely, high-quality care due to the global dermatology workforce shortage.² Dermoscopy, or epiluminescent microscopy, has improved the accuracy of melanoma classification compared with unaided visual diagnosis by clinicians, but dermoscopes are relatively expensive, limiting their usefulness in low-resource countries, and the accuracy rates vary markedly depending on the experience level of the user.^3

–7 Several groups have developed computer algorithms to improve classification of skin lesions from digital images, but many require dermoscope images, and most have only been scored on small datasets.^8
–10

We have developed a unique computerized classification system that combines aspects of computer vision technology with big data methodology. The system uses a patented algorithm combined with a large proprietary database of diagnosed lesion images to match new images with database images. In the current study, we evaluated the accuracy of the algorithm to classify melanoma lesions 10 mm or larger in a test database of histopathologically diagnosed skin lesions.

Materials and Methods

Image Database Development

Study personnel obtained written informed consent from each study participant according to the protocol approved by Fox Commercial Institutional Review Board, Ltd. (Springfield, IL).

To create a database of lesion images from a diverse population, study personnel recruited English-speaking volunteers, 18 years of age and older, from a community location in an ethnically diverse suburban neighborhood in Los Angeles County, California from April 3, 2011 through July 29, 2011. Participants self-reported basic demographic information, and they presented to study personnel one or more lesions on the visible skin that they were willing to have photographed. Study personnel then photographed the lesions using Celestron^® (Torrance, CA) hand-held digital microscopes. These “microscopes” were 2-megapixel cameras with a macro lens surrounded by a ring of white light-emitting diode lights. To ensure consistent lighting and imaging distance, we attached an opaque 10-cm tube to the front of each camera. With the open end of the tube in contact with the subject's skin, all ambient light, but not the light-emitting diode light, was blocked, and the imaging distance, and therefore magnification, was fixed.

Each participant's lesion images were uploaded into the image database, and at least one of three board-certified dermatologists later reviewed and diagnosed the lesions using standard clinical criteria. For lesion images diagnosed by more than one dermatologist, we examined agreement between dermatologists, who in some cases provided more than one possible diagnosis, ranked by their degree of confidence in the diagnosis. In a subsequent qualitative review of agreement among dermatologists, we elected to eliminate diagnoses provided by one dermatologist due to substantial inconsistencies in the data and poor agreement with the other two dermatologists. We then developed a decision-tree algorithm to assign diagnosis. If only one dermatologist reviewed the lesion image, the algorithm assigned the diagnosis listed if the dermatologist provided only a single diagnosis, or the diagnosis with highest confidence if the dermatologist provided more than one diagnosis. If the lesion image was reviewed by both dermatologists, the algorithm assigned a diagnosis based on the degree of confidence in each specific diagnosis; if the dermatologists' highest-confidence diagnosis did not match, the image was eliminated from the database. However, consistent with dermatology standard of care, if any of the diagnoses were for malignant conditions, we assigned the malignant diagnosis.

The participants self-reported birth year, gender, and race or ethnicity (American Indian/Alaska Native, Asian/Pacific Islander, black/African American, white/Caucasian, or other). Participants also self-reported Hispanic/Latino ethnicity (yes/no). Study personnel de-identified the data by creating a unique alphanumeric identifier for each participant that linked his or her demographic data to his or her individual lesion images.

Due to the low population prevalence of skin malignancies, we enriched the database with images of melanoma that had been previously confirmed by histopathology. We acquired the images from DermNet NZ,¹¹ a well-known and reliable source of skin-lesion images.

Image-Search Algorithm and Query Images

We created a proprietary, patent-protected, image-search algorithm that builds on proven computer vision methods, in particular from the field of content-based image retrieval (CBIR). Our algorithm compares new images of skin lesions (“query images”) with the database of diagnosed skin-lesion images (“database images”). It uses orientation- and artifact-independent image information on lesion size, color, shape, and texture to create a single high-dimensional signature for each image. The algorithm then computes the distance between the query image's signature and those of the database images to determine which database images are closest to the query image. In CBIR terms, the best matching database images are the query results. Query results are then converted into an estimate of the query diagnosis through majority voting. This is equivalent to constructing a k-nearest-neighbor classifier,¹² where a diagnosis is assigned to the query based on the frequency of diagnostic labels attached to the images in the CBIR result set. To evaluate the classifier accuracy, scoring was based on the diagnosis with the most votes.

To assess the accuracy of the image-search algorithm to classify melanomas, we randomly selected 129 images of nonmelanoma lesions and 208 images of melanoma lesions, all with the largest diameter of at least 10 mm. All melanoma query images were selected from the set of images acquired from DermNet NZ to ensure confirmation of the malignancy by histopathology. The nonmelanoma query images were randomly selected from the study database of images collected at the community location; all lesions imaged at the community center were diagnosed clinically. We based the sample sizes on results from prior sample-size calculations to detect a sensitivity of 85% and a specificity of 90% at a 95% confidence level.

Data Analysis

We examined the demographic characteristics of participants who contributed lesion images to the image database by determining counts and percentages across categories within age, gender, and race/ethnicity. We also examined counts and percentages of queried and database images by melanoma and nonmelanoma diagnosis and by category of diagnosis among nonmelanoma lesions.

To evaluate the ability of the image-match algorithm to accurately discriminate between melanoma and nonmelanoma lesions, we calculated several classification accuracy measures. We calculated sensitivity (the ratio of the number of true melanomas that the algorithm correctly classified as melanoma to the number of all true melanomas), specificity (the ratio of the number of true nonmelanomas that the algorithm correctly classified as nonmelanoma to the number of true nonmelanomas), positive predictive value (PPV) of a test (the ratio of the number of true melanomas that the algorithm correctly classified as melanoma to the number of all lesions the algorithm classified as melanoma), negative predictive value (NPV) of a test (the ratio of the number of true nonmelanomas that the algorithm correctly classified as nonmelanoma to the number of lesions the algorithm classified as nonmelanoma), overall accuracy (the ratio of the true melanomas and true nonmelanomas correctly classified by the algorithm to the total number of lesions evaluated by the algorithm), positive likelihood ratio test (the ratio of the odds that the algorithm correctly classified a true melanoma as melanoma to the odds that it incorrectly classified a true nonmelanoma as melanoma, which is also given as sensitivity/[1 – specificity]), and negative likelihood ratio (the ratio of the odds that the algorithm correctly classified a true nonmelanoma as nonmelanoma to the odds that it incorrectly classified a true melanoma as nonmelanoma, which is also given by [1 – sensitivity]/specificity).We calculated these estimates and their corresponding 95% confidence intervals using standard methods.^13,14

Results

In total, 1,900 participants were recruited by study personnel and agreed to allow study team members to capture digital images of their skin lesions for inclusion in the skin-lesion image database. Study personnel asked participants to report information about specific demographic variables, and for each variable over 90% of all recruited participants provided responses (Table 1). In addition to the recruited participants, we enriched the study database with images of histopathologically diagnosed melanoma lesions, for a total of 2,202 individual image donors. Demographic data were unavailable for the individuals who provided the additional melanoma images that we acquired to enrich the study database.

Table 1.

Demographics of the Individuals Who Contributed One or More Images to the Lesion Image Database

DEMOGRAPHIC	N	%
Age (years) at enrollment
18–24	711	32.3
25–34	412	18.7
35–44	264	12.0
45–54	220	10.0
55–64	152	6.9
65–74	79	3.6
75–84	23	1.0
85+	7	0.3
No response or unavailable^a	334	15.2
Total	2,202	100.0
Gender
Female	1,152	52.3
Male	728	33.1
Both	1	0.0
No response or unavailable^a	321	14.6
Total	2,202	100.0
Race/ethnicity
American Indian or Alaska Native	45	2.0
Asian or Pacific Islander	306	13.9
Black or African American	96	4.3
White or Caucasian	660	30.0
Other	609	27.7
No response or unavailable^a	486	22.1
Total	2,202	100.0
Hispanic or Latino ethnicity
Hispanic or Latino	862	39.2
Not Hispanic or Latino	903	41.0
No response or unavailable^a	437	19.8
Total	2,202	100.0

Category includes 302 individuals whose melanoma images were acquired from the DermNet NZ database.¹¹

Overall, the recruited participants were fairly young, with slightly more than half under the age of 35 years, but older participants, who are likely to present with different conditions and skin types and tones, were well represented (Table 1). Participants were predominantly female and ethnically diverse, with just under one-third identifying as white/Caucasian and nearly 40% identifying as Hispanic or Latino.

The study database of 11,780 images (Table 2) included all 11,478 images from the 1,900 participants and an additional 302 images from the DermNet NZ database. Melanoma diagnoses accounted for about 1 in 10 database images and nearly two-thirds of query images. The distribution of images by diagnosis was similar for nonmelanoma query images and the database images, as expected given that the nonmelanoma query images were randomly selected from the database images. The combined nevus diagnoses accounted for more than four-fifths of all nonmelanoma database images and just over three-quarters of all query diagnoses.

Table 2.

Distribution of Queried and Database Images by Lesion Diagnosis

	QUERIED IMAGES		DATABASE IMAGES
DIAGNOSIS	N	%	N	%
Melanoma	208	61.7	1,293	11.0
Nonmelanoma	129	38.2	10,487	89.0
Total	337	100.0	11,780	100.0
Distribution of nonmelanoma diagnoses
Angioma	3	2.3	329	3.1
Basal cell carcinoma	5	3.9	140	1.3
Compound nevus	32	24.8	2,646	25.2
Dermal nevus	11	8.5	837	8.0
Dysplastic nevus	7	5.4	673	6.4
Junctional nevus	45	34.9	4,524	43.1
Lentigo	8	6.2	571	5.4
Scar	0	0.0	28	0.3
Seborrheic keratosis	15	11.6	576	5.5
Spitz nevus	1	0.8	132	1.3
Squamous cell carcinoma	2	1.6	31	0.3
Total	129	100.0	10,487	100.0

Diagnoses of all queried melanoma image lesions were confirmed by histopathology.

Nearly every measure of accuracy of the algorithm to correctly identify melanoma and nonmelanoma lesions 10 mm or larger exceeded 90% (Table 3). The algorithm accurately identified more than 90% of true melanoma and true nonmelanoma lesions. Of those the algorithm identified as having melanoma, more than 94% in fact had melanoma; of those the algorithm identified as not containing melanoma, more than 85% did not have melanoma. Overall, the algorithm accurately identified nearly 91% of true melanomas and true nonmelanomas, with a lower confidence limit greater than 87%.

Table 3.

Algorithm-Matching Results and Clinical Accuracy Measures for Melanoma Versus Nonmelanoma Lesions with Maximum Diameter of 10 mm or Larger

ALGORITHM	ESTIMATE (%)	95% CI
Matching results (n)
TP: true melanoma classified as melanoma	188
FN: true melanoma classified as nonmelanoma	20
FP: true nonmelanoma classified as melanoma	11
TN: true nonmelanoma classified as nonmelanoma	118
Clinical accuracy
SN: TP/(TP+FN)	90.4	85.6–93.7
SP: TN/(FP+TN)	91.5	85.4–95.2
Positive predictive value: TP/(TP+FP)	94.5	90.4–96.9
Negative predictive value: TN/(FN+TN)	85.5	78.7–90.4
Overall accuracy: (TP+TN)/(TP+FP+FN+TN)	90.8	87.2–93.4
Positive likelihood ratio: SN/(1 – SP)	10.6	8.9–12.7
Negative likelihood ratio: (1 – SN)/SP	0.11	0.10–0.12

CI, confidence interval; FN, false negative; FP, false positive; SN, sensitivity; SP, specificity; TN, true negative; TP, true positive.

In addition, the likelihood ratio tests were highly discriminatory. The odds that a true melanoma would be accurately identified were more than 10 times greater than the likelihood that a true nonmelanoma would be incorrectly identified as melanoma. The odds that a true melanoma would be incorrectly identified as nonmelanoma were only 1/10th the odds that a true nonmelanoma would be accurately identified.

Discussion

The image-matching algorithm performed with high accuracy for the classification of larger melanoma lesions, exceeding reported accuracy rates that typically range from 70% to 86% among experienced board-certified dermatologists, the current gold standard for clinical diagnosis of skin lesions.^15
–17 It is of importance that, compared with a study by Carli et al.¹⁸ that specifically examined accuracy rates for classification of large lesions (10 mm or larger) as in the present study, the reported accuracy rates of the image-matching algorithm outperformed those of practicing dermatologists visually diagnosing lesions (algorithm versus naked eye examination: sensitivity, 90.4% versus 82.9%; specificity, 91.5% versus 75.8%; and overall accuracy, 94.5% versus 79.5%). The same study¹⁸ also reported accuracy rates for visual examination combined with dermoscopy, and the reported sensitivity of the algorithm was slightly lower (algorithm versus visual examination plus dermoscopy, 90.4% versus 93.3%), but the specificity and overall accuracy were substantially higher for the algorithm (algorithm versus dermoscopy: specificity, 91.5% versus 77.3%; overall accuracy, 90.8% versus 85.6%). Several other studies reported dermatologists' diagnostic accuracy rates using dermoscopy, and the image-matching algorithm generally achieved comparable or higher sensitivity and specificity.^{3

–7,15
–17}

The algorithm also demonstrated high PPV and NPV in this test database enriched for melanoma images. These findings were not surprising given that PPV and NPV are affected by the prevalence of the condition, and the prevalence in this test dataset was high. We would therefore expect the algorithm to achieve lower PPV in a population with a lower prevalence of melanoma, as would be encountered in a typical dermatology or primary care setting. However, the lower PPV and NPV expected in a general or screened population do not negate the value of this or any other screening test because if the sensitivity of the test is very high, the potential benefits of the test due to increased survival and reduced healthcare costs through earlier detection may be greater than the cost of performing the test.

The likelihood ratio test results, mathematically related to sensitivity and specificity, are not affected by the population prevalence of the disease. The algorithm's positive likelihood ratio was above the threshold of 10, which is considered strong conclusive evidence that the disease is likely to be present, and the algorithm's small negative likelihood ratio was at the threshold of 0.1, which is considered conclusive evidence that the disease is not likely to be present.¹³

We designed the algorithm to model dermatologists' approach to skin-lesion analysis: mental matching of a query image (a patient's lesion) with a personal “database” of images learned during medical school, residency, and routine clinical practice. Accurate classification is limited only by the size of the database and the user's recall ability. Our system replicated this model by creating a proprietary skin-lesion image database that included diagnosed lesions from participants diverse in age, race/ethnicity, skin type, and specific diagnosis. The system will increase in robustness over time through the addition of new participant records to the system database, similar to the continually increasing expertise of dermatologists through the daily examination of skin lesions. However, unlike dermatologists, who will ultimately retire and take their expertise with them, our image database will continually increase in the number of records and the diversity of participants, lesion types, and presentations, including skin conditions in children.

During algorithm development, we discovered that lesion size and pathology were complex drivers of image-signature design. We therefore elected first to maximize algorithm accuracy for the identification of larger lesions that are more likely to be clinically important (presented here). However, we believe that optimization for larger, riskier lesions does not diminish the algorithm's usefulness given the critical global need for simple, inexpensive, and accurate tools to classify skin lesions. Other highly effective medical technologies have been developed that have a lower limit of resolution, such as positron emission tomography, which has been established as an important clinical tool in the evaluation and management of cancer despite the limitation of its use to larger tumors.¹⁹ However, in a second phase of algorithm development, we will conduct additional assessments of smaller lesions and apply that information to improve algorithm performance for those lesions as well as for nonmelanoma malignancies, including basal and squamous cell carcinomas, and pediatric lesions. Subsequent phases could include development of lesion evolution tracking to compare multiple images of single lesions over time because detection of changes in the growth or visual presentation of lesions is another important dimension to the classification of suspicious lesions.

A potential limitation of the system is the lack of demographic information associated with images acquired from DermNet NZ that were used to enrich the database for melanoma and that were used as query images. We elected to enrich the database with 302 histopathologically confirmed melanoma lesion images due to the low prevalence of melanoma in the population of individuals we recruited to donate skin-lesion images for the study database. The additional images provided the algorithm with a greater representation of melanoma characteristics upon which to optimize the image-matching query. If the query images and the enriched melanoma images were obtained predominantly from white individuals, then the present study results may not directly reflect the robustness of the matching algorithm within diverse populations. However, all nonmelanoma images in the study database were derived from a highly diverse population of skin-lesion image donors, and the algorithm demonstrated very high specificity, NPV, negative likelihood ratio, and overall accuracy, suggesting that the algorithm classification works well even in diverse populations. It is of importance, however, that the algorithm uses only the information in the image, without supplementary demographic information, to classify the image, and the accuracy estimates—the correct classification of melanoma and nonmelanoma images—are internally valid and not affected by the lack of demographic data for this subset of images.

Another potential limitation of our image-matching algorithm is that it currently relies on a database of clinically diagnosed, rather than biopsy-proven, skin lesions. However, the CBIR technology that drives our algorithm exploits visual characteristics of the skin-lesion images that dermatologists have identified as common to a given lesion type. Therefore, even though diagnostic error is likely more frequent in our database of clinically diagnosed lesions than it would be in a database with biopsy-proven diagnoses, the visual characteristics consistent with the lesion types assigned by the dermatologists are sufficient to inform the algorithm and result in exceptionally high sensitivity and specificity of the matches. It is possible that a comparably sized database of biopsied images would give higher classification accuracy results, but the standard of care for lesion types that are clearly benign is clinical diagnosis, and therefore development of this algorithm would not be feasible if the database were restricted to biopsy-proven lesions.

In conclusion, this newly developed algorithm has the potential to improve classification of larger melanoma skin lesions, and ultimately all skin cancers, using digital images captured with low-cost cameras.

Footnotes

Acknowledgments

The authors wish to acknowledge the contributions of Yvonne Chen, Vice President of Operations, Lūbax, Inc., to the administration of the study. This study was funded by Lūbax, Inc.

Disclosure Statement

R.H.C. and M.S. are employees of Lūbax, Inc. R.H.C., M.S., and S.M.E. hold stock in Lūbax, Inc. S.M.E. and E.M. are paid consultants for Lūbax, Inc. J.M.K., V.A., and J.B. declare no competing financial interests exist. J.M.K., V.A., and J.B. served as unpaid consultants on the study design and reviewed the database and algorithm output, but they have no financial or other interests in this technology or Lūbax, Inc. The terms of this arrangement have been reviewed and approved by Fox Commercial Institutional Review Board, Ltd. in accordance with its policy on objectivity in research.

References

Markovic

, Erickson

, Rao

. Malignant melanoma in the 21st century, Part 1: Epidemiology, risk factors, screening, prevention, and diagnosis. Mayo Clin Proc, 2007; 82:364–380.

Kimball

, Resnick

Jr.

The US dermatology workforce: A specialty remains in shortage. J Am Acad Dermatol, 2008; 59:741–5.

Bafounta

, Beauchet

, Aegerter

, Saiag

. Is dermoscopy (epiluminescence microscopy) useful for the diagnosis of melanoma? Results of a meta-analysis using techniques adapted to the evaluation of diagnostic tests. Arch Dermatol, 2001; 137:1343–1350.

Kittler

, Pehamberger

, Wolff

, Binder

. Diagnostic accuracy of dermoscopy. Lancel Oncol, 2002; 3:159–165.

Vestergaard

, Macaskill

, Holt

, Menzies

. Dermoscopy compared with naked eye examination for the diagnosis of primary melanoma: A meta-analysis of studies performed in a clinical setting. Br J Dermatol, 2008; 159:669–676.

Rajpara

, Botello

, Townend

, Ormerod

. Systematic review of dermoscopy and digital dermoscopy/artificial intelligence for the diagnosis of melanoma. Br J Dermatol, 2009; 161:591–604.

Stevenson

, Mickan

, Mallett

, Ayya

. Systematic review of diagnostic accuracy of reflectance confocal microscopy for melanoma diagnosis in patients with clinically equivocal skin lesions. Dermatol Pract Concept, 2013; 3:19–27.

Ballerini

, Li

, Fisher

, Rees

. A query-by-example content-based image retrieval system of non-melanoma skin lesions. In: Caputo

, Müller

, Syeda-Mahmood

, Duncan

, Wang

, Kalpathy-Cramer

, eds. Lecture notes in computer science, Vol. 5853: Medical content-based retrieval for clinical decision support. Berlin: Springer, 2010:31–38.

Haider

, Cho

, Amelard

, Wong

, Clausi

. Enhanced classification of malignant melanoma lesions via the integration of physiological features from dermatological photographs. Conf Proc IEEE Eng Med Biol Soc, 2014; 6455–6458.

10.

, Zhou

, Zheng

, Cheung

, Koh

. Early melanoma diagnosis with mobile imaging. Conf Proc IEEE Eng Med Biol Soc, 2014; 6752–6757.

11.

DermNet

. The dermatology resource. July 14, 2014. Available at www.dermnetnz.org/ (last accessed October 2, 2014 ).

12.

Duda

, Hart

, Stork

. Pattern classification, 2nd ed. New York: Wiley, 2000.

13.

Fletcher

, Fletcher

. Clinical epidemiology: The essentials, 5th ed. Baltimore: Lippincott Williams & Wilkins, 2005.

14.

Dean

, Sullivan

, Soe

. OpenEpi: Open source epidemiologic statistics for public health, version. Updated September 22, 2014. Available at www.OpenEpi.com (last accessed November 12, 2014 ).

15.

Bono

, Bartoli

, Cascinelli

, Lualdi

, Maurichi

, Moglia

, Tragni

, Tomatis

, Marchesini

. Melanoma detection. A prospective study comparing diagnosis with the naked eye, dermatoscopy and telespectrophotometry. Dermatology, 2002; 205:362–366.

16.

Sellheyer

, Bergfeld

. A retrospective biopsy study of the clinical diagnostic accuracy of common skin diseases by different specialties compared with dermatology. J Am Acad Dermatol, 2005; 52:823–830.

17.

Rosendahl

, Tschandl

, Cameron

, Kittler

. Diagnostic accuracy of dermatoscopy for melanocytic and nonmelanocytic pigmented lesions. J Am Acad Dermatol, 2011; 64:1068–1073.

18.

Carli

, De Giorgi

, Chiarugi

, Nardini

, Mannone

, Stante

, Quercioli

, Sestini

, Giannotti

. Effect of lesion size on the diagnostic performance of dermoscopy in melanoma detection. Dermatology, 2003; 206:292–296.

19.

Raylman

, Majewski

, Wojcik

, Weisenberger

, Kross

, Popov

, Bishop

. The potential role of positron emission mammography for detection of breast cancer: A phantom study. Med Phys, 2000; 27:1943–1954.