Abstract
Background:
With the advent of smartphone devices, an increasing number of mHealth applications that target melanoma identification have been developed, but none addresses the general context of melanoma and nonmelanoma skin cancer identification.
Introduction:
In this study a smartphone application using fractal and classical image analysis for the risk assessment of skin lesions is systematically evaluated to determine its sensitivity and specificity in the diagnosis of melanoma and nonmelanoma skin cancer along with actinic keratosis and Bowen's disease.
Materials and Methods:
In the Department of Dermatology, Catharina Hospital Eindhoven, The Netherlands, 341 melanocytic and nonmelanocytic lesions were imaged using SkinVision app; 239 underwent histopathological examination, while the rest of 102 lesions were clinically diagnosed as clearly benign and not removed. The algorithm has been calibrated using the images of the first 233 lesions. The calibrated version of the algorithm was used in a subset of 108 lesions, and the obtained results were compared with the medical findings.
Results:
On the 108 cases used for evaluation the algorithm scored 80% sensitivity and 78% specificity in detecting (pre)malignant conditions.
Discussion:
Although less accurate than the dermatologist's clinical eye, the app may offer support to other professionals who are less familiar with differentiating between benign and malignant lesions.
Conclusion:
An mHealth application for the risk assessment of skin lesions was evaluated. It adds value to diagnosis tools of its type by taking into consideration pigmented and nonpigmented lesions all together and detecting signs of malignancy with high sensitivity.
Introduction
With the advances of mobile technology, an increasing number of mHealth applications related to dermatology emerged. A particular segment aims for melanoma detection, but there are few clinical studies discussing their accuracy and this leads to criticism. 1,2 –5
In this context, this article presents a risk assessment algorithm for (pre)malignant lesion detection which is integrated in SkinVision mHealth application (developed by Skin Vision B.V., The Netherlands). The app primary targets laypersons, but can be also used by nondermatologists. The app should be certified by appropriated regulatory bodies and has been already certified in Europe, New Zealand, and Australia.
The algorithm was initially dedicated for the analysis of melanocytic lesions and melanoma detection. It underwent a clinical study and scored a sensitivity of 73% in detecting melanoma, while the specificity was 83%. 6
Now, the algorithm has been adapted and recalibrated for a much broader segment of (pre)malignant skin lesion diagnosis. With this study, we assess the sensitivity and specificity of the recalibrated algorithm in the diagnosis of melanoma and nonmelanoma skin cancer along with in situ melanoma and actinic keratosis and Bowen's disease, as premalignant skin lesions, compared to clinical diagnosis and histopathological results.
Materials and Methods
Materials and Data Acquisition
We included, in total, 341 lesions in 256 consecutive patients seen routinely for different skin problems, including skin cancer follow-up at the Department of Dermatology, Catharina Hospital Eindhoven, The Netherlands in the period December 2014–April 2016, after obtaining the patients' written informed consent. The study had been approved by the local ethics committee (No. 2014-41).
Consecutive patients were seen by one dermatologist and one resident in dermatology. The lesions were selected by the dermatologist in case of clinical clear benign lesions or during total body skin examination in patients with multiple skin malignancies in the past and, also, in patients referred by general practitioner for skin malignancies. The lesions underwent visual and dermatoscopic diagnosis and in 239 cases either incisional or excisional biopsies were performed, followed by histopathological examination of the specimens. The rest of 102 lesions were clinically clearly benign and not removed.
All the skin lesions have been imaged by the participant MDs using an iPhone 5 smartphone (equipped with an 8 megapixel autofocus camera, 1080p, 30 frames/s in video mode) using SkinVision imaging application. 7 The image acquisition is done in video mode such that quality checked images of a skin condition can be easily acquired in real time by a user. The obtained images are focused, without shadows and completely containing the lesion of interest.
The acquired data were used as follows: 233 pigmented and nonpigmented melanocytic and nonmelanocytic lesions (Table 1) were used as training data set for the rule-based algorithm calibration, and the rest of 108 cases (Table 1) were used as test data set for the algorithm's evaluation in terms of specificity and sensitivity in detecting (pre)malignant lesions. Programming was completed before biopsy results.
The Pigmented and Nonpigmented Melanocytic and Nonmelanocytic Lesions Collected and Used During the Study
Risk Assessment Algorithm
The risk assessment algorithm is based on fractal and classical image analysis and has been presented in detail in Ref. 6 For this study, the rule-based algorithm has been recalibrated to accommodate nonpigmented skin conditions. The training data have been used to add new rules and improve the existing ones.
To increase the specificity regarding nonpigmented lesions and the sensitivity regarding melanoma the following new parameters were also taken into consideration: lesion area, mean gray scale value and standard deviation over the lesion, and circularity of the lesion extracted from the fractal map. The lesion detection followed the procedure described in Ref. 7
Moreover, it was clear that images taken out of context cannot offer enough information that is vital for an accurate classification and so, to compensate this limitation, a questionnaire regarding the lesion's characteristics was developed (Table 2). The patients' answers to the questions in Table 2 were used along with the texture, color, and geometric features extracted from the skin lesions' images to calculate the associated risk degree.
Questionnaire Addressing Lesion's Characteristics
The rule based algorithm results are presented as follows: high risk if the algorithm identifies the lesion as malignant or premalignant, medium or low risk otherwise.
Statistical Evaluation
The sensitivity, specificity, positive predictive value, and negative predictive value were calculated using a 95% confidence interval (CI) (95% CI) using the online statistical software VassarStats.
Results
For the statistical analysis, we considered that benign proven skin lesions should fall in the low- or medium-risk lesion class, and melanoma and nonmelanoma skin cancer along with in situ melanoma, actinic keratosis, and Bowen's disease should fall into the high-risk lesion class. No images were excluded from the study due to low quality, because we wanted to measure the performances of the algorithm as close as possible to real use setup.
On the set of 108 cases used for accuracy evaluation, the algorithm (incorporating the patients' answers to the questionnaire regarding the lesion) obtained a sensitivity of 80% (95% CI 0.62–0.90) and a specificity of 78.08% (95% CI 0.66–0.86). The positive and negative predictive values along with other results can be found in Table 3.
Results Obtained by The Algorithm Incorporating the Patients' Answers to the Questionnaire Regarding the Lesion's Characteristics Versus Clinical Diagnosis—Eindhoven Database-Test Data Set
In the group of the (pre)malignant lesions from the test data set 28/35 were rated high risk, 6/35 medium risk, and 1/35 low risk (Table 4). Out of the 35 (pre)malignant lesions only a basal cell carcinoma (BCC) that appeared during the last 3 months to 1 year and not associated with any symptoms was assessed as being low risk by the algorithm (Fig. 1a,b). All melanomas were rated high risk.

Absolute Numbers and Percentage of the Different Subgroups with False Positive and False Negative Ratings—Eindhoven Database-Test Data Set
Values shown in bold represent false positive and negative results.
All nevi (junctional, dermal, and dysplastic) have been merged in one subgroup.
In the group of the benign lesions from the test data set, nearly all diagnoses were based on clinical and dermoscopic examination by a dermatologist. Only in six lesions incisional/excisional biopsies were taken because of clinical doubt about (pre)malignancy or the lesion was excised because of complaints. Out of 73 benign lesions, however, 14 were rated as high risk by the application and 11 as medium risk (Table 4). Four of these lesions were examined histopathologically (results: one lichen planus-like keratosis, one dysplastic nevus, one verruca vulgaris, and one verruca seborrhoica).
The algorithm has also been evaluated on the same test set without patient's answers to the questionnaire (details per class in Table 4); in this setup, the sensitivity was 71%, and the specificity was 56% (Table 5).
Results Obtained by the Algorithm Without Incorporating the Patients' Answers to the Questionnaire Regarding the Lesion's Characteristics Versus Clinical Diagnosis—Eindhoven Database-Test Data Set
CI, confidence interval.
The original algorithm, the one dedicated to melanocytic lesion analysis and to melanoma detection, obtained, on the same set (details per class in Table 4), a sensitivity of 25% and a specificity of 75% (Table 6). The poor results in this case were inherent due to the fact that the algorithm contained no specifications on how to treat nonpigmented lesions.
Results Obtained by the Algorithm for Melanoma Detection and Melanocytic Lesions (the Algorithm Before Recalibration) Versus Clinical Diagnosis—Eindhoven Database-Test Data Set
CI, confidence interval.
To identify the recalibrated algorithm's performance on melanocytic lesions, for melanoma identification only, we rerun the tests on the database of images with histopathology that was used during the algorithm's first investigation performed at Ludwig Maximilian University, Munich (LMU study). 6 The database contained 144 melanocytic lesions: 118 benign nevi and 26 melanomas (without the answers to the questions as input). Comparing the algorithm result to the histopathological results, one can observe a significant increase in sensitivity: 88% (initially, the sensitivity in detecting melanoma was 73%), while the specificity dropped to ∼79% (initially 83%). The results can be found in Table 7.
Results Obtained by the Recalibrated Algorithm Versus Histopathological Results on the Ludwig Maximilian University Database
CI, confidence interval.
Discussion
This prospective study targeted melanocytic and nonmelanocytic skin lesions to test the ability of an mHealth app in detecting melanoma and nonmelanoma skin cancers, specifically BCC and squamous cell carcinoma (SCC), and their precursors. We have considered that actinic keratosis and Bowen's disease should be included in the high risk class, too, to urge the users to visit the doctor because these lesions may progress to invasive skin cancer in the future.
Although no images were excluded from the study due to low quality, the image acquisition step revealed to be problematic for those skin lesions with ulcerations or bleeding on top, surrounded by mottled or extremely tanned skin, localized in skin folds and hairy areas and finally those with other skin lesions in close proximity. Focusing on and capturing them proved difficult and the image quality was quite low. This was also the case in four out of six malignant skin tumors (three BCC and one SCC) receiving the medium or low risk ratings. The above listed situations are defined to be exclusion criteria for participation in future studies. In addition, in the app itself, these criteria are clearly mentioned as factors that might alter the assessment quality in the Contraindication section (Terms and Conditions).
Most of the verruca seborrhoica and lentigo solaris, two harmless but frequently appearing skin lesions, were not classified as low risk (55% and 71%, respectively). In practice this means that patients could panic unnecessarily. Because in many patients these lesions are multiple, in future studies adjustment of the questionnaire might be helpful for better differentiation by the application.
A limitation of the study is the low number of SCCs. Only three tumors were included, one of them being localized in an area with extreme skin folds which may be the reason for receiving a medium rating. In three subgroups of benign lesions the number of included lesions was also limited to three for each diagnosis (lichen planus-like keratosis, sebaceous hyperplasia, and verruca vulgaris), which may interfere with the results of the analysis, too.
According to the answers given in the questionnaire, 25 out of 35 (pre)malignant lesions appeared during the last 1 month to 1 year. In this context, it is clear that the application might be useful for early detection of the most common types of skin cancer and their precursors (Fig. 2a, b).

The recalibrated algorithm (with or without the patients answers to the questions regarding the lesion) has a higher sensitivity than the original algorithm both on the Eindhoven and LMU data sets. The specificity is low for the Eindhoven data set when disregarding the patients input on the lesion. (The low specificity is due to the fact that in the test set there have been included mainly skin lesions belonging to nonmelanoma skin cancer differential diagnosis.) Considering the lesion characterization provided by the patients, the specificity increases with 22% (mainly due to correct identification of folliculitis, scars, and psoriasis), while the sensitivity increased with 9% (by correctly identifying as dangerous a series of the bleeding BCCs). Providing context to an image increases the accuracy of the diagnosis. The specificity obtained on the LMU data set was not substantially decreased (4%) by adapting the algorithm to also analyze nonmelanocytic lesions, but the sensitivity increased with ∼15%.
Although not evaluated in this study, from previous literature it is obvious that the app is not as sensitive and specific as the dermatologist's clinical eye. 6 In contrast, skin lesions are also treated by professionals who are less familiar with differentiating between benign and malignant, like general practitioners and (plastic) surgeons, scoring much lower 8 –10 which may result in overtreatment (unnecessary excisions in case of benign lesions) or incorrect referral to secondary care. In case they are willing to improve their diagnostic skills, 11 the app might support them in accurately recognizing benign lesions and differentiating them from skin malignancies or might offer them the opportunity to communicate with dermatologists in case of doubt; however, this should be further investigated in the future.
Conclusions
At this point, the mHealth application for skin lesion risk assessment cannot serve as a diagnostic tool, but it is useful in raising awareness for both melanoma and nonmelanoma skin cancer among users without creating panic or a false sense of security.
The presentation of the algorithm's results intends to offer an assessment and information on the analyzed lesion. For example, if a lesion receives a “high risk” rating, the user is informed that the lesion strongly resembles a skin cancer, but there is the possibility that the lesion is benign; if a lesion receives a “low risk” rating the user is informed that the lesion seems healthy, but there is a small probability that the lesion is malignant. For all ratings, the user is informed that the “assessment does not intend to provide an official medical diagnosis, nor replace visits to a doctor.” The user is encouraged to periodically test his/her lesions and archive both the images and the results to compare the lesion's evolution in time. Moreover, the user can share the archive with his/her doctor.
The app might also be helpful for some practitioners in deciding whether or not in treating a lesion or referring a patient.
Further and larger multicenter studies are already taken into consideration to gather more data to improve the overall accuracy of the algorithm and to investigate the impact of the app on the supply of healthcare by different professionals.
Footnotes
Acknowledgment
The design and oversight of this study was supported, in part, by SkinVision B.V.
Disclosure Statement
M.T., M.H., and T.B. have nothing to disclose. A.U. and T.R. report receiving fees from SkinVision B.V., during this study.
