mHealth App for Risk Assessment of Pigmented and Nonpigmented Skin Lesions—A Study on Sensitivity and Specificity in Detecting Malignancy

Abstract

Background:

With the advent of smartphone devices, an increasing number of mHealth applications that target melanoma identification have been developed, but none addresses the general context of melanoma and nonmelanoma skin cancer identification.

Introduction:

In this study a smartphone application using fractal and classical image analysis for the risk assessment of skin lesions is systematically evaluated to determine its sensitivity and specificity in the diagnosis of melanoma and nonmelanoma skin cancer along with actinic keratosis and Bowen's disease.

Materials and Methods:

In the Department of Dermatology, Catharina Hospital Eindhoven, The Netherlands, 341 melanocytic and nonmelanocytic lesions were imaged using SkinVision app; 239 underwent histopathological examination, while the rest of 102 lesions were clinically diagnosed as clearly benign and not removed. The algorithm has been calibrated using the images of the first 233 lesions. The calibrated version of the algorithm was used in a subset of 108 lesions, and the obtained results were compared with the medical findings.

Results:

On the 108 cases used for evaluation the algorithm scored 80% sensitivity and 78% specificity in detecting (pre)malignant conditions.

Discussion:

Although less accurate than the dermatologist's clinical eye, the app may offer support to other professionals who are less familiar with differentiating between benign and malignant lesions.

Conclusion:

An mHealth application for the risk assessment of skin lesions was evaluated. It adds value to diagnosis tools of its type by taking into consideration pigmented and nonpigmented lesions all together and detecting signs of malignancy with high sensitivity.

Introduction

With the advances of mobile technology, an increasing number of mHealth applications related to dermatology emerged. A particular segment aims for melanoma detection, but there are few clinical studies discussing their accuracy and this leads to criticism.^1,2

–5

In this context, this article presents a risk assessment algorithm for (pre)malignant lesion detection which is integrated in SkinVision mHealth application (developed by Skin Vision B.V., The Netherlands). The app primary targets laypersons, but can be also used by nondermatologists. The app should be certified by appropriated regulatory bodies and has been already certified in Europe, New Zealand, and Australia.

The algorithm was initially dedicated for the analysis of melanocytic lesions and melanoma detection. It underwent a clinical study and scored a sensitivity of 73% in detecting melanoma, while the specificity was 83%.⁶

Now, the algorithm has been adapted and recalibrated for a much broader segment of (pre)malignant skin lesion diagnosis. With this study, we assess the sensitivity and specificity of the recalibrated algorithm in the diagnosis of melanoma and nonmelanoma skin cancer along with in situ melanoma and actinic keratosis and Bowen's disease, as premalignant skin lesions, compared to clinical diagnosis and histopathological results.

Materials and Methods

Materials and Data Acquisition

We included, in total, 341 lesions in 256 consecutive patients seen routinely for different skin problems, including skin cancer follow-up at the Department of Dermatology, Catharina Hospital Eindhoven, The Netherlands in the period December 2014–April 2016, after obtaining the patients' written informed consent. The study had been approved by the local ethics committee (No. 2014-41).

Consecutive patients were seen by one dermatologist and one resident in dermatology. The lesions were selected by the dermatologist in case of clinical clear benign lesions or during total body skin examination in patients with multiple skin malignancies in the past and, also, in patients referred by general practitioner for skin malignancies. The lesions underwent visual and dermatoscopic diagnosis and in 239 cases either incisional or excisional biopsies were performed, followed by histopathological examination of the specimens. The rest of 102 lesions were clinically clearly benign and not removed.

All the skin lesions have been imaged by the participant MDs using an iPhone 5 smartphone (equipped with an 8 megapixel autofocus camera, 1080p, 30 frames/s in video mode) using SkinVision imaging application.⁷ The image acquisition is done in video mode such that quality checked images of a skin condition can be easily acquired in real time by a user. The obtained images are focused, without shadows and completely containing the lesion of interest.

The acquired data were used as follows: 233 pigmented and nonpigmented melanocytic and nonmelanocytic lesions (Table 1) were used as training data set for the rule-based algorithm calibration, and the rest of 108 cases (Table 1) were used as test data set for the algorithm's evaluation in terms of specificity and sensitivity in detecting (pre)malignant lesions. Programming was completed before biopsy results.

Table 1.

The Pigmented and Nonpigmented Melanocytic and Nonmelanocytic Lesions Collected and Used During the Study

		CASES USED FOR THE CALIBRATION OF THE ALGORITHM		CASES USED FOR ALGORITHM EVALUATION IN TERMS OF SPECIFICITY AND SENSITIVITY
NO.	LESION TYPE	NO. OF LESIONS PER TYPE	NO. OF LESIONS THAT UNDERWENT BIOPSY PER TYPE	NO. OF LESIONS PER TYPE	NO. OF LESIONS THAT UNDERWENT BIOPSY PER TYPE
1	Basal cell carcinoma	94	94	16	16
2	Actinic keratosis	10	10	8	3
3	Squamous cell carcinoma	7	7	3	3
4	Bowens disease	9	9	5	5
5	Melanoma	4	4	2	2
6	Melanoma in situ	2	2	1	1
7	Psoriasis	10	5	10	0
8	Eczema	3	1	0	0
9	Lichen planus-like keratosis	8	8	3	1
10	Histiocytoma	7	6	8	0
11	Folliculitis	10	3	8	0
12	Sebaceous hyperplasia	1	0	3	0
13	Angioma senilis	10	3	6	1
14	Clear cell acanthoma	1	1	1	1
15	Scar	7	1	4	0
16	Verruca vulgaris	1	1	1	1
17	Verruca seborrhoica	12	8	9	1
18	Nevus naevocellularis (dermal/compound)	12	10	11	5
19	Nevus naevocellularis (junctional/epidermal)	24	24	1	0
20	Dysplastic nevus	1	1	1	1
21	Lentigo solaris/senilis	0	0	7	0
	Total	233	198	108	41

Risk Assessment Algorithm

The risk assessment algorithm is based on fractal and classical image analysis and has been presented in detail in Ref.⁶ For this study, the rule-based algorithm has been recalibrated to accommodate nonpigmented skin conditions. The training data have been used to add new rules and improve the existing ones.

To increase the specificity regarding nonpigmented lesions and the sensitivity regarding melanoma the following new parameters were also taken into consideration: lesion area, mean gray scale value and standard deviation over the lesion, and circularity of the lesion extracted from the fractal map. The lesion detection followed the procedure described in Ref.⁷

Moreover, it was clear that images taken out of context cannot offer enough information that is vital for an accurate classification and so, to compensate this limitation, a questionnaire regarding the lesion's characteristics was developed (Table 2). The patients' answers to the questions in Table 2 were used along with the texture, color, and geometric features extracted from the skin lesions' images to calculate the associated risk degree.

Table 2.

Questionnaire Addressing Lesion's Characteristics

NO.	QUESTION	ANSWERS
1	How long does the lesion exist	<1 week
		1–4 weeks
		1–3 months
		3 months–1 year
		>1 year
2	Is the lesion always visible	Yes/no
3	Single or multiple lesion
A	Single	Yes/no
B	Multiple	Yes/no
4	Accompanying characteristics of the lesion(s)
A	Fever	Yes/no
B	Pain	Yes/no
C	Itch	Yes/no
D	Bleeding	Yes/no
E	Development after traumatizing skin	Yes/no
F	Scaling	Yes/no
5	Changes past weeks to 3 months
A	Larger, thicker, and/or irregular	Yes/no
B	Color (brighter or darker, multicolored)	Yes/no

The rule based algorithm results are presented as follows: high risk if the algorithm identifies the lesion as malignant or premalignant, medium or low risk otherwise.

Statistical Evaluation

The sensitivity, specificity, positive predictive value, and negative predictive value were calculated using a 95% confidence interval (CI) (95% CI) using the online statistical software VassarStats.

Results

For the statistical analysis, we considered that benign proven skin lesions should fall in the low- or medium-risk lesion class, and melanoma and nonmelanoma skin cancer along with in situ melanoma, actinic keratosis, and Bowen's disease should fall into the high-risk lesion class. No images were excluded from the study due to low quality, because we wanted to measure the performances of the algorithm as close as possible to real use setup.

On the set of 108 cases used for accuracy evaluation, the algorithm (incorporating the patients' answers to the questionnaire regarding the lesion) obtained a sensitivity of 80% (95% CI 0.62–0.90) and a specificity of 78.08% (95% CI 0.66–0.86). The positive and negative predictive values along with other results can be found in Table 3.

Table 3.

Results Obtained by The Algorithm Incorporating the Patients' Answers to the Questionnaire Regarding the Lesion's Characteristics Versus Clinical Diagnosis—Eindhoven Database-Test Data Set

ALGORITHM	MALIGNANT OR PREMALIGNANT	BENIGN	SUM
High risk	28	16	44	Sensitivity (95% CI)	0.8 (0.62–0.90)	Positive predictive value (95% CI)	0.63 (0.47–0.77)
Low/medium risk	7	57	64	Specificity (95% CI)	0.78 (0.66–0.86)	Negative predictive value (95% CI)	0.89 (0.78–0.95)
Sum	35	73	108

In the group of the (pre)malignant lesions from the test data set 28/35 were rated high risk, 6/35 medium risk, and 1/35 low risk (Table 4). Out of the 35 (pre)malignant lesions only a basal cell carcinoma (BCC) that appeared during the last 3 months to 1 year and not associated with any symptoms was assessed as being low risk by the algorithm (Fig. 1a,b). All melanomas were rated high risk.

Fig. 1.

(a) BCC rated low risk by the algorithm (b) associated fractal map. Color images available online at www.liebertpub.com/tmj

Table 4.

Absolute Numbers and Percentage of the Different Subgroups with False Positive and False Negative Ratings—Eindhoven Database-Test Data Set

		ALGORITHM'S RATINGS CONSIDERING THE ANSWERS TO THE QUESTIONNAIRE			ALGORITHM'S RATINGS WITHOUT CONSIDERING THE ANSWERS TO THE QUESTIONNAIRE			ALGORITHM'S RATINGS BEFORE RECALIBRATION
DIAGNOSES	NO. INCLUDED	HIGH, n (%)	MEDIUM, n (%)	LOW, n (%)	HIGH, n (%)	MEDIUM, n (%)	LOW, n (%)	HIGH, n (%)	MEDIUM, n (%)	LOW, n (%)
Basal cell carcinoma	16	11 (69)	4 (25)	1 (6)	9 (56)	6 (38)	1 (6)	3 (19)	11 (69)	2 (12)
Squamous cell carcinoma	3	2 (66)	1 (33)	0 (0)	1 (33)	2 (66)	0 (0)	1 (33)	2 (66)	0 (0)
Actinic keratosis	8	7 (88)	1 (12)	0 (0)	7 (88)	1 (12)	0 (0)	3 (43)	3 (43)	1 (14)
Bowens' disease	5	5 (100)	0 (0)	0 (0)	5 (100)	0 (0)	0 (0)	1 (20)	3 (60)	1 (20)
Melanoma	2	2 (100)	0 (0)	0 (0)	2 (100)	0 (0)	0 (0)	1 (50)	1 (50)	0 (0)
Melanoma in situ	1	1 (100)	0 (0)	0 (0)	1 (100)	0 (0)	0 (0	0 (0)	0 (0)	1 (100)
Psoriasis	10	0 (0)	0 (0)	10 (100	7 (70)	2 (20)	1 (10)	5 (50)	4 (40)	1 (10)
Eczema	0	—	—	—	—	—	—	—	—	—
Lichen planus-like keratosis	3	2 (66)	0 (0)	1 (33)	2 (66)	0 (0)	1 (33)	1 (33)	1 (33)	1 (33)
Histiocytoma	8	2 (25)	1 (13)	5 (63)	3 (38)	2 (25)	3 (37)	2 (25)	3 (38)	3 (37)
Folliculitis	8	0 (0)	0 (0)	8 (100)	5 (62)	1 (13)	2 (25)	2 (25)	5 (62)	1 (13)
Sebaceous hyperplasia	3	2 (66)	1 (33)	0 (0)	2 (66)	1 (33)	0 (0)	2 (66)	1 (33)	0 (0)
Nevus naevo-cellularis^a	13	2 (15)	2 (15)	9 (70)	3 (23)	2 (15)	8 (62)	2 (15)	5 (38)	6 (47)
Verruca vulgaris	1	1 (100)	0 (0)	0 (0)	1 (100)	0 (0)	0 (0)	1 (100)	0 (0)	0 (0)
Verruca seborrhoica	9	2 (23)	3 (33)	4 (44)	2 (23)	3 (33)	4 (44)	2 (23)	4 (44)	3 (33)
Senile angioma	6	0 (0)	2 (33)	4 (66)	0 (0)	1 (17)	5 (83)	0 (0)	1 (17)	5 (83)
Clear cell acanthoma	1	0 (0)	0 (0)	1 (100)	0 (0)	0 (0)	1 (100)	0 (0)	0 (0)	1 (100)
Scar	4	0 (0)	0 (0)	4 (100)	4 (100)	0 (0)	0 (0)	1 (25)	3 (75)	0 (0)
Lentigo solaris/senilis	7	3 (42	2 (29)	2 (29)	3 (42)	3 (43)	1 (15)	0 (0)	6 (85)	1 (15)
Total	108

Values shown in bold represent false positive and negative results.

All nevi (junctional, dermal, and dysplastic) have been merged in one subgroup.

In the group of the benign lesions from the test data set, nearly all diagnoses were based on clinical and dermoscopic examination by a dermatologist. Only in six lesions incisional/excisional biopsies were taken because of clinical doubt about (pre)malignancy or the lesion was excised because of complaints. Out of 73 benign lesions, however, 14 were rated as high risk by the application and 11 as medium risk (Table 4). Four of these lesions were examined histopathologically (results: one lichen planus-like keratosis, one dysplastic nevus, one verruca vulgaris, and one verruca seborrhoica).

The algorithm has also been evaluated on the same test set without patient's answers to the questionnaire (details per class in Table 4); in this setup, the sensitivity was 71%, and the specificity was 56% (Table 5).

Table 5.

Results Obtained by the Algorithm Without Incorporating the Patients' Answers to the Questionnaire Regarding the Lesion's Characteristics Versus Clinical Diagnosis—Eindhoven Database-Test Data Set

ALGORITHM	MALIGNANT OR PREMALIGNANT	BENIGN	SUM
High risk	25	32	57	Sensitivity (95% CI)	0.71 (0.53–0.84)	Positive predictive value (95% CI)	0.43 (0.30–0.57)
Low/medium risk	10	41	51	Specificity (95% CI)	0.56 (0.44–0.67)	Negative predictive value (95% CI)	0.80 (0.66–0.89)
Sum	35	73	108

CI, confidence interval.

The original algorithm, the one dedicated to melanocytic lesion analysis and to melanoma detection, obtained, on the same set (details per class in Table 4), a sensitivity of 25% and a specificity of 75% (Table 6). The poor results in this case were inherent due to the fact that the algorithm contained no specifications on how to treat nonpigmented lesions.

Table 6.

Results Obtained by the Algorithm for Melanoma Detection and Melanocytic Lesions (the Algorithm Before Recalibration) Versus Clinical Diagnosis—Eindhoven Database-Test Data Set

ALGORITHM	MALIGNANT OR PREMALIGNANT	BENIGN	SUM
High risk	9	18	27	Sensitivity (95% CI)	0.25 (0.13–0.43)	Positive predictive value (95% CI)	0.33 (0.17–0.53)
Low/medium risk	26	55	81	Specificity (95% CI)	0.75 (0.63–0.84)	Negative predictive value (95% CI)	0.67 (0.56–0.77)
Sum	35	73	108

CI, confidence interval.

To identify the recalibrated algorithm's performance on melanocytic lesions, for melanoma identification only, we rerun the tests on the database of images with histopathology that was used during the algorithm's first investigation performed at Ludwig Maximilian University, Munich (LMU study).⁶ The database contained 144 melanocytic lesions: 118 benign nevi and 26 melanomas (without the answers to the questions as input). Comparing the algorithm result to the histopathological results, one can observe a significant increase in sensitivity: 88% (initially, the sensitivity in detecting melanoma was 73%), while the specificity dropped to ∼79% (initially 83%). The results can be found in Table 7.

Table 7.

Results Obtained by the Recalibrated Algorithm Versus Histopathological Results on the Ludwig Maximilian University Database

ALGORITHM	MELANOMA	BENIGN LESIONS	SUM
High risk	23	25	48	Sensitivity (95% CI)	0.884 (0.68–0.96)	Positive predictive value (95% CI)	0.47 (0.33–0.62)
Low/medium risk	3	93	96	Specificity (95% CI)	0.788 (0.70–0.85)	Negative predictive value (95% CI)	0.96 (0.90–0.99)
Sum	26	118	144

CI, confidence interval.

Discussion

This prospective study targeted melanocytic and nonmelanocytic skin lesions to test the ability of an mHealth app in detecting melanoma and nonmelanoma skin cancers, specifically BCC and squamous cell carcinoma (SCC), and their precursors. We have considered that actinic keratosis and Bowen's disease should be included in the high risk class, too, to urge the users to visit the doctor because these lesions may progress to invasive skin cancer in the future.

Although no images were excluded from the study due to low quality, the image acquisition step revealed to be problematic for those skin lesions with ulcerations or bleeding on top, surrounded by mottled or extremely tanned skin, localized in skin folds and hairy areas and finally those with other skin lesions in close proximity. Focusing on and capturing them proved difficult and the image quality was quite low. This was also the case in four out of six malignant skin tumors (three BCC and one SCC) receiving the medium or low risk ratings. The above listed situations are defined to be exclusion criteria for participation in future studies. In addition, in the app itself, these criteria are clearly mentioned as factors that might alter the assessment quality in the Contraindication section (Terms and Conditions).

Most of the verruca seborrhoica and lentigo solaris, two harmless but frequently appearing skin lesions, were not classified as low risk (55% and 71%, respectively). In practice this means that patients could panic unnecessarily. Because in many patients these lesions are multiple, in future studies adjustment of the questionnaire might be helpful for better differentiation by the application.

A limitation of the study is the low number of SCCs. Only three tumors were included, one of them being localized in an area with extreme skin folds which may be the reason for receiving a medium rating. In three subgroups of benign lesions the number of included lesions was also limited to three for each diagnosis (lichen planus-like keratosis, sebaceous hyperplasia, and verruca vulgaris), which may interfere with the results of the analysis, too.

According to the answers given in the questionnaire, 25 out of 35 (pre)malignant lesions appeared during the last 1 month to 1 year. In this context, it is clear that the application might be useful for early detection of the most common types of skin cancer and their precursors (Fig. 2a, b).

Fig. 2.

(a) BCC rated high risk by the algorithm (appeared in the last 1 to 3 months, no changes, no other symptoms) (b) associated fractal map. Color images available online at www.liebertpub.com/tmj

The recalibrated algorithm (with or without the patients answers to the questions regarding the lesion) has a higher sensitivity than the original algorithm both on the Eindhoven and LMU data sets. The specificity is low for the Eindhoven data set when disregarding the patients input on the lesion. (The low specificity is due to the fact that in the test set there have been included mainly skin lesions belonging to nonmelanoma skin cancer differential diagnosis.) Considering the lesion characterization provided by the patients, the specificity increases with 22% (mainly due to correct identification of folliculitis, scars, and psoriasis), while the sensitivity increased with 9% (by correctly identifying as dangerous a series of the bleeding BCCs). Providing context to an image increases the accuracy of the diagnosis. The specificity obtained on the LMU data set was not substantially decreased (4%) by adapting the algorithm to also analyze nonmelanocytic lesions, but the sensitivity increased with ∼15%.

Although not evaluated in this study, from previous literature it is obvious that the app is not as sensitive and specific as the dermatologist's clinical eye.⁶ In contrast, skin lesions are also treated by professionals who are less familiar with differentiating between benign and malignant, like general practitioners and (plastic) surgeons, scoring much lower^8
–10 which may result in overtreatment (unnecessary excisions in case of benign lesions) or incorrect referral to secondary care. In case they are willing to improve their diagnostic skills,¹¹ the app might support them in accurately recognizing benign lesions and differentiating them from skin malignancies or might offer them the opportunity to communicate with dermatologists in case of doubt; however, this should be further investigated in the future.

Conclusions

At this point, the mHealth application for skin lesion risk assessment cannot serve as a diagnostic tool, but it is useful in raising awareness for both melanoma and nonmelanoma skin cancer among users without creating panic or a false sense of security.

The presentation of the algorithm's results intends to offer an assessment and information on the analyzed lesion. For example, if a lesion receives a “high risk” rating, the user is informed that the lesion strongly resembles a skin cancer, but there is the possibility that the lesion is benign; if a lesion receives a “low risk” rating the user is informed that the lesion seems healthy, but there is a small probability that the lesion is malignant. For all ratings, the user is informed that the “assessment does not intend to provide an official medical diagnosis, nor replace visits to a doctor.” The user is encouraged to periodically test his/her lesions and archive both the images and the results to compare the lesion's evolution in time. Moreover, the user can share the archive with his/her doctor.

The app might also be helpful for some practitioners in deciding whether or not in treating a lesion or referring a patient.

Further and larger multicenter studies are already taken into consideration to gather more data to improve the overall accuracy of the algorithm and to investigate the impact of the app on the supply of healthcare by different professionals.

Footnotes

Acknowledgment

The design and oversight of this study was supported, in part, by SkinVision B.V.

Disclosure Statement

M.T., M.H., and T.B. have nothing to disclose. A.U. and T.R. report receiving fees from SkinVision B.V., during this study.

References

Boulos

, Brewer

, Karimkhani

, Buller

, Dellavalle

. Mobile medical and health apps: State of the art, concerns, regulatory control and certification. Online J Public Health Inform, 2014; 5:229.

Kassianos

, Emery

, Murchie

, Walter

. Smartphone applications for melanoma detection by community, patient and generalist clinician users: A review. Br J Dermatol, 2015; 172:1507–1518.

Wolf

, Moreau

, Akilov

, Patton

, English

III , Ho

, Ferris

. Diagnostic inaccuracy of smartphone applications for melanoma detection. JAMA Dermatol, 2013; 149:422–426.

Robson

, Blackford

, Roberts

Caution in melanoma risk analysis with smartphone application technology. Br J Dermatol, 2012; 167; 703–704.

Hamilton

, Brady

. Medical professional involvement in smartphone “apps” in dermatology. Br J Dermatol, 2012; 167:220–221.

Maier

, Kulichova

, Schotten

, Astrid

, Ruzicka

, Berking

, Udrea

. Accuracy of a smartphone application using fractal image analysis of pigmented moles compared to clinical diagnosis and histological result. J Eur Acad Dermatol Venereol, 2015; 29:663–667.

Udrea

, Lupu

. Real-time acquisition of quality verified nonstandardized color images for skin lesions risk assessment—A preliminary study. ICSTCC, 2014; 199–204.

Goulding

, Levine

, Blizard

, Deroide

, Swale

. Dermatology surgery: A comparison of activity and outcomes in primary and secondary care. Br J Dermatol, 2009; 161:110–114.

Morrison

, O'Loughlin

, Powell

. Suspected skin malignancy: A comparison of diagnoses of family practitioners and dermatologists in 493 patients. Int J Dermatol, 2001; 40:104–107.

10.

van Rijsingen

, Vossen

, van Huystee

, Gorgels

, Gerritsen

. Skin tumour surgery in primary care: Do general practitioners need to improve their surgical skills?. Dermatology, 2015; 230:318–323.

11.

van Rijsingen

MCJ

, van Bon

, van der Wilt

, Lagro-Janssen

, Gerritsen

. The current and future role of general practitioners in skin cancer care. An assessment of 268 general practitioners. Br J Dermatol, 2014; 170:1366–1368.