Abstract
Introduction
eHealth emerged as an interdisciplinary research area about 70 years ago. This study employs probabilistic techniques to semantically analyse scientific literature related to the field of eHealth in order to identify topics and trends and discuss their comparative evolution.
Methods
Authors collected titles and abstracts of published literature on eHealth as indexed in PubMed. Basic statistical and bibliometric techniques were applied to overall describe the collected corpus; Latent Dirichlet Allocation was employed for unsupervised topics identification; topics trends analysis was performed, and correlation graphs were plotted were relevant.
Results
A total of 30,425 records on eHealth were retrieved from PubMed (all records till 31 December 2017, search on 8 May 2018) and 23,988 of these were included to the study corpus. eHealth domain shows a growth higher than the growth of the entire PubMed corpus, with a mean increase of eHealth corpus proportion of about 7% per year for the last 20 years. Probabilistic topics modelling identified 100 meaningful topics, which were organised by the authors in nine different categories: general; service model; disease; medical specialty; behaviour and lifestyle; education; technology; evaluation; and regulatory issues.
Discussion
Trends analysis shows a continuous shift in focus. Early emphasis on medical image transmission and system integration has been replaced by increased focus on standards, wearables and sensor devices, now giving way to mobile applications, social media and data analytics. Attention on disease is also shifting, from initial popularity of surgery, trauma and acute heart disease, to the emergence of chronic disease support, and the recent attention to cancer, infectious disease, mental disorders, paediatrics and perinatal care; most interestingly the current swift increase is in research related to lifestyle and behaviour change. The steady growth of all topics related to assessment and various systematic evaluation techniques indicates a maturing research field that moves towards real world application.
Introduction
eHealth is an interdisciplinary area that exploits informatics and communication technologies for the improvement of health and healthcare delivery. The Medical Subject Headings (MeSH) controlled vocabulary by the USA National Library of Medicine1 classifies eHealth in the group of terms including telemedicine, telehealth and mobile health (mHealth). A number of recent reviews cover different areas of the field, for example service models2,3 or specific diseases.4–6 The entire field of eHealth (or telemedicine, as was the preferred term in earlier days) has been explored via several bibliometric studies7–11 that have analysed the growth of eHealth literature and its geographical, author and journal distribution. However, its content is far less analysed.
A first attempt for content analysis of the field measured word frequencies in a corpus of 17,932 records retrieved from PubMed and investigated the change in frequencies between two different time spans, namely the early epoch 1970–1995, and the contemporary epoch 2009–2013.8 Another study used the in-house thematic organization of journals in categories established by the Thomson Reuters Web of Science literature indexing database and counted the number of papers in each category (out of a total of 7960 papers) to rank different categories and their change in rank over five-year periods.9 Finally, a more recent approach used the keywords in a corpus of 3272 abstracts on regional health information networks retrieved after screening from PubMed, Elsevier and Springer indexing databases to plot knowledge network concept maps for high frequency keywords.10 Similarly, keyword frequencies and their sudden changes over time were used to analyse a corpus of 2074 records on mHealth retrieved from the Thomson Reuters Web of Science literature indexing database.11 Three of these approaches are limited to measuring word frequencies, and also limited either in the coarseness of time scale for trends analysis,8 or limited to specific sub-fields, namely ‘regional health information networks',10 and ‘mHealth'.11 The fourth approach9 analyses the evolution of preset journal categories of a specific bibliographic database.
In this article, we present an overall analysis of eHealth topics and trends in published literature indexed in PubMed, based on unsupervised topics modelling techniques.12,13 The proposed analysis encompasses the broader field of eHealth; uses an unsupervised statistical approach that is based on probabilistic analysis, clustering and multi-label classification of all the words in the abstract (including title and keywords) of the publications in the corpus; produces user-unbiased, immersive word clouds to characterise different topics; and provides yearly trends analysis for identified topics.
Methods
The analysis of eHealth literature trends followed a three-step approach. First, a generic PubMed query was used to build the corpus of published literature on eHealth. Then, main topics in the area of eHealth were identified via the Latent Dirichlet Allocation (LDA) unsupervised topics modelling algorithm. Finally, trends were deduced based on the popularity of each topic per year. The flowchart of the experimental process is shown in Figure 1 and presented in detail in the following paragraphs. This work expands our earlier pilot attempt14 by redesigning the initial query to include appropriate terms as justified by current literature and by tuning the probabilistic algorithm for stability.

Flowchart of the experimental procedure to identify eHealth topics and their trends in PubMed indexed literature.
Search strategy
In a systematic review, the researcher sets out to gather a maximalist set of literature, then goes through the items one by one and uses a predefined set of inclusion-exclusion criteria to extract only relevant literature. In our approach the requirement is to identify a representative sample of literature, to be used for deducing topics and their trends. Since LDA is an unsupervised, probabilistic approach, a major error can arise if the corpus contains documents irrelevant to the desired field (because this would create irrelevant topics or skew existing topics). Thus, in order to create the corpus one can either follow the same approach as in the systematic review or devise another search strategy to ensure that the retrieved documents are relevant to the field in question.
First, we decided to limit our search in the PubMed bibliographic database,15 as the most popular and comprehensive database for literature on biomedical sciences. A simple search with the keyword 'eHealth' in PubMed returns records in the order of 30,000. So, a one-by-one study of all records becomes an impractical task. Thus, we opted for an alternative search strategy which would bring a presentative part of the literature while ensuring that the corpus would not contain irrelevant papers.
To achieve this, firstly we relied on the MeSH categorization of papers in PubMed which is performed by experts of the USA National Library of Medicine (NLM). The MeSH controlled considers eHealth as an entry term for the MeSH heading ‘Telemedicine', together with the additional terms ‘Mobile Health', ‘Telehealth' and ‘mHealth'. So, searching by ‘telemedicine [MeSH Terms]' includes all articles identified by experts to correspond to this category and the search automatically explodes to include all narrower terms in the hierarchical MeSH list. We also added as query terms the basic eHealth areas as identified in a recent review of telehealth service frameworks,16 which identifies eHealth as a superset of ‘Telehealth', ‘Telemedicine', ‘mHealth' and ‘Telecare'.
The exact query is shown in Figure 2. The search was designed to return all papers categorised under the specific MeSH term and all papers including any of the above keywords in the metadata fields title (TI) or abstract (AB). The search considered all PubMed records published up to the end of year 2017. The query ‘0001/01/01' (PDAT): ‘2017/12/31' (PDAT) was used to retrieve the total number of publications included in PubMed till the end of year 2017. PubMed database was searched via its proprietary search engine interface on 9 May 2018 and the results were retrieved in XML format using the provided export function.

PubMed query to identify published scientific literature on eHealth.
Topic modelling
Topic modeling algorithms are statistical methods that automatically extract topics from a large and unstructured collection of documents. In this work, we used the algorithm of LDA,12,13 as it has been shown to achieve highest precision in comparison to other topic modeling algorithms in corpora of Wikipedia and New York Times documents.17 Furthermore, LDA has been successfully applied in many other research areas, for example to analyse and classify genomic sequences,18 classify images based on visual words topic modelling,19 detect discussion themes in social networks20 and analyse source code.21
The LDA model assumes that each document is a mixture of topics. A topic is characterised by a collection of words, each word contributing with each own weight. A word can belong to multiple topics and documents can contain multiple topics. The algorithm starts by randomly assigning each word of a document in one of K topics. Then, it calculates conditional probabilities for each topic in each document
The concept of this iterative process is shown in Figure 3. First step involves the random allocation of each word to a topic. In the second step, the algorithm re-assigns each word to a topic: assumes that all words (apart from the word in question) are assigned correctly and uses this information to calculate (a) how prevalent a word is across the topics, and (b) how prevalent are topics in each document. Based on these calculations, re-assigns the word to the most probable topic. The process is repeated for each word in the corpus. The re-assignment is performed again for a large number of iterations.

The principles of Latent Dirichlet Allocation algorithm.
For this study, we used the MALLET22 machine learning for language toolkit (ver. 2.0.8) which also includes a fast and highly scalable implementation of the LDA algorithm. To avoid additive noise to the topic modelling algorithm from the free text of articles, we performed the following preprocessing cleaning process (implemented in Java): (a) remove all punctuation and escape codes; (b) exclude all stop-words using the stop-words list from the Text Categorization Project;23 (c) convert all words to their lemmas by applying the stemming procedure of Krovetz stemmer;24 and (d) exclude articles with no words in their abstracts or with less than 3 letters in their titles.
Research in LDA has explored several heuristic approaches to tune the statistical algorithm, which includes identifying a suitable number of topics and a meaningful number of iterations.25 In this study, in order to identify an appropriate number of topics and number of iterations we considered the reproducibility of LDA along successive runs. Considering that eHealth addresses many different diseases, health conditions, and different population types, and uses numerous different technologies, evaluation study types, while it raises ethical, security and legal issues, the number of topics is expected to be in the range of decades or even hundreds. Thus, we performed a series of investigative experiments using different number of topics (from K=40 to K=310 with a step of 10) at different iterations (from 2000 to 10,000, with a step of 2000 iterations). For any given number of topics and iterations, we repeated the experiment 10 times and calculated the similarity between successive repetitions. As a similarity metric we chose the widely used Jaccard distance26 between the group of the 10 top words defining each topic with the respective group of words defining each topic in the subsequent repetition. Jaccard distance J(A, B) between two sets A and B is defined as follows:
Manual topics labelling
The authors of this article screened independently the top 20 words of each topic and their respective probability to belong to the topic and manually devised a short label (title) for each topic; to visually aid this task, normalised word probabilities were used to create word clouds using the Kumo Java word cloud library.27 The researchers discussed their findings and agreed on a consolidated topics list. Subsequently, the researchers screened the topics and manually organised them in conceptual categories via context similarity.
Trend analysis
To analyse topics trends, we followed the approach proposed by Priva and Austerweil.28 First, the weight of each topic for each document was calculated as the percentage of the document words that belong to a topic. Then, the popularity of the topic was defined as the yearly topic contribution estimate P(t, y) of the topic (t) for each year (y) is calculated as the mean of the weight of this topic for all documents published this year (Dy):
Finally, we applied moving averaging (over three years interval) to smooth out short term fluctuations. Also, we used linear regression to identify the positive or negative trend for each topic. Correlations, where needed, were deduced via the Pearson correlation coefficient using a two-tailored t-test for estimating significance. Correlation graphs were drawn using the Gephi v0.9.2 open source software for network visualization.29
Results
Search results
The PubMed query (performed on 9 May 2018) returned 30,425 publications (total XML file size of 275 MB). Preprocessing excluded 6437 publications with no abstracts (21% of all retrieved records). The final corpus included title, abstract and keywords of 23,988 publications, corresponding to a total of 2,781,905 words that correspond to a vocabulary of 51,199 words.
The earliest record retrieved dates to 1947, however the earliest record included in the final corpus dates to 1975 (as records with no title or abstract were excluded from the corpus). Records retrieved have been published in 3574 journals which correspond to about 11% of the total number of journals currently indexed in PubMed (as retrieved on 21 December 2018 from the online PubMed journal list available at https://www.nlm.nih.gov/bsd/serfile_addedinfo.html). Only eight journal titles have published the 26% of all eHealth articles, while 3030 journals have published less than 10 articles each and 1284 journals have published only one article each. The top eight journal titles include: Journal of Telemedicine and Telecare, SAGE; Studies in Health Technology and Informatics, IOS Press; Telemedicine and e-Health, Mary Ann Liebert; Journal of Medical Internet Research, JMIR; Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE; Journal of Medical Systems, Springer; JMIR mHealth and uHealth, JMIR; and International journal of medical informatics, Elsevier. When seen as the proportion of the overall PubMed corpus, eHealth domain, as represented in the retrieved results, corresponds to a percentage of less than 0.11%.
Figure 4 shows the yearly distribution of publications included in the corpus, as an absolute value (diagram on the left) and as a percentage of the total number of articles indexed in PubMed for each year (diagram on the right). The earliest article in the corpus has been published in 1975. Until 1991, less than 10 eHealth-related articles were published each year, while from 1992–1994, less than 100 papers where published each year. The eHealth field shows a significant increase from year 1995 onwards, with a mean increase for the number of papers of about 12% per year during the last 20 years. Also, eHealth domain shows a growth higher than the growth of the entire PubMed corpus, with a mean increase of eHealth corpus proportion of about 7% per year for the last 20 years.

eHealth publications per year in PubMed; (a) percentage of total PubMed publications; (b) absolute number.
LDA algorithm tuning
The results of stability evaluation between 10 successive algorithm runs for different number of iterations are shown in Figure 5. The algorithm shows convergent similarities for the experiments corresponding to 8000 and 10,000 iterations; therefore, subsequent experiments were performed at 10,000 iterations.

Percentage of topics with at least six similar words out of the 10 top words constituting each topic for 10 successive algorithm runs plotted against the number of topics and for different number of iterations.
The results of stability evaluation for 10 successive runs at 10,000 iterations for different number of topics and for different similarity strengths are shown in Figure 6. The first local maximum of similarity (i.e. stability) was observed at 110 number of topics for all three similarity strengths.

Percentage of topics with at least X similar words out of the 10 top words constituting each topic for 10 successive algorithm runs plotted against the number of topics. Different graphs correspond to different similarity strength, i.e. at least six or seven or eight similar words out of the top 10 words of each topic.
Topics modelling
Screening of the 110 topics led to the identification of 100 meaningful topics (91% of all topics) which were organised by the experts in nine categories as follows:
General: eight topics discussing general concepts in the eHealth domain, such as 'technology advancements', 'current approaches', 'systematic reviews', etc. Service model: 22 topics that discuss the remote health delivery method. Examples include topics corresponding to different eHealth implementation frameworks (e.g. as defined in Van Dyk)16, i.e. 'eHealth', 'mHealth', and 'telemedicine'. This category also includes topics related to the patient group involved (e.g. chronic disease patients, elders, ethnic minorities, veterans, family caregivers) and topics related to different health services supported (e.g. diagnosis, patient consultation, primary healthcare, rehabilitation, clinical decision support, etc.). Disease: 16 topics corresponding to specific diseases (e.g. diabetes, hypertension, stroke, cancer, etc.). Medical specialty: 10 topics pertaining to specific medical specialties (e.g. psychiatry, surgery, pathology, dermatology, paediatrics, etc.). Behaviour and lifestyle: eight topics discussing concepts related to healthy lifestyle and behaviour (e.g. substance abuse, smoking cessation, weight loss, behaviour change). Education: four topics related to formal and informal medical education and health information dissemination. Technology: 15 topics addressing different technologies and related issues employed in eHealth (e.g. videoconferencing, wearables, social media, electronic health record, standards). Evaluation: 12 topics related to eHealth evaluation concepts (e.g. clinical cohort studies, randomised control trials, pilot evaluation, quality of life assessment). Regulatory issues: five topics addressing ethical, legal, financial and technology acceptance concepts.
Appendix 1 shows the list of identified topics organised in categories. The overall popularity of each topic is presented as the percentage of the overall topic contribution and is used to calculate the rank of the topic in the entire list (most popular topic is ranked first). Within each category, topics are organised in two groups, corresponding to positive and negative trends respectively; within each group, topics are listed with descending order of regression coefficient absolute value.
Synthesis of results
The top 10 most popular topics over the entire time span are shown in Figure 7. The two most popular are general topics discussing current approaches (6.44% total topic contribution) and technology advancements (4.27%); the third most popular is a topic on primary healthcare service model (2.71%). There are four topics related to evaluation studies (intervention assessment; clinical cohort studies; pilot evaluation studies; and patient satisfaction) and three topics discussing technology (system integration; development methodology; and wearable and sensor devices).

The top ten most popular topics over the entire time span.
Linear regression analysis showed a significant linear fit (R2 > 80%) for 47 topics; 31 topics showed a positive trend and 16 topics showed negative trend. The top five topics with the higher positive trend are: (a) intervention assessment; (b) depression and related disorders; (c) literature review; (d) mHealth; and (e) clinical cohort studies. The top five topics with the higher negative trend are: (a) videoconference; (b) radiology; (c) software application; (d) technology advancements (reviews); and (e) telemedicine.
The first category defined by the authors includes topics related to general concepts category. Only the topic on literature review shows a significant increasing trend, while the rest of general topics show a decreasing or indifferent trend.
In the service models category, topics with significant linear fit and increasing popularity include mHealth, eHealth, chronic disease (self)management, rehabilitation, veteran care, and ethnic minority health. Topics with significant linear fit and decreasing popularity include patient-physician communication, diagnosis, telemedicine, and military/space/disaster medicine. Trend graphs of the remaining topics in this category (with a non-significant linear fit) are shown in Figure 8(a) showing topics related to remote consultation or telephone advice to decline in popularity after a peak in the early 2000's, while remote patient monitoring shows an increasing popularity after early 2000's. Figure 8(b) shows topics related to home care to decline after a peak in the 2000's, while the area of informal care by family caregivers is gaining popularity. Figure 8(c) shows topics related to remote and urban health services to decline, while health in developing countries in increasing popularity. Finally, topics related to clinical care services (primary healthcare, clinical decision support, and hospitalization reduction) sustain their popularity as shown in Figure 8(d).

Popularity trends of service model category topics that do not present a significant linear fit; (a) topics related to remote consultation or monitoring; (b) topics related to home care; (c) topics related to health services to remote and underprivileged areas; (d) topics related to clinical care services.
The five most popular topics of those related to specific disease are: (a) depression; (b) chronic heart disease; (c) diabetes; (d) stroke; and (e) acute heart disease. All disease-related topics present increasing trends; exceptions are two topics related to acute illness, namely acute heart disease and trauma. Figure 9 shows the trends of all disease topics organised in four different graphs: Figure 9(a) shows mental and cognitive disease topics; Figure 9(b) cardiovascular disease; Figure 9(c) chronic disease; and Figure 9(d) miscellaneous disease topics.

Popularity trends of topics related to different disease; (a) topics related to mental and cognitive diseases; (b) topics related to cardiovascular disease; (c) topics on various chronic diseases; (d) topics on miscellaneous diseases.
The topics related to different medical specialties are shown in Figure 10; organised in four different graphs; Figure 10(a) shows topics related to diagnostic laboratories; Figure 10(b) shows topics related to perinatal care; Figure 10(c) shows topics related to surgery; and Figure 10(d) shows miscellaneous specialties. In general, medical specialty related topics present decreasing trends expect from two topics related to infant care and paediatrics that show an increasing popularity (Figure 10(b)).

Popularity trends of topics related to medical specialties; (a) topics related to diagnostic laboratories; (b) topics related to perinatal care; (c) topics related to surgery; (d) topics on miscellaneous medical specialties.
An interesting category is the one with topics related to behaviour change and lifestyle, shown in Figure 11. All eight topics show an increasing trend, especially during the last decade; behaviour change turns out to be the most popular. In the category related to education, topics on eHealth literacy and health information show an increasing trend, while topics related to medical education and online information present a decreasing popularity (Figure 12).

Popularity trends of topics related to behaviour and lifestyle.

Popularity trends of topics related to education and information dissemination.
The topics related to different technologies in eHealth are shown in Figure 13. Topics addressing personal media (Figure 13(a)) show varying trends: wearables and body sensor networks decline after a peak around late 2000's; on the contrary, mobile applications, text messaging and social media show significantly increasing trends in the last decade. Signal and image processing and analysis topics (Figure 13(b)) show in general a rather declining trend, apart from the topic on data collection and analysis which remains popular through the years. Topics related to system development (Figure 13(c)) show a decreasing popularity for system integration; standards and interoperability has peaked in the late 2000's but is decreasing in popularity since then; security has seen a steep increase during early 2010's and is decreasing the last five years; however, discussion on development methodology is at an increasing popularity. Finally, videoconference shows a steep decline, while electronic health records remain within focus for the entire year span of this study (Figure 13(d)).

Popularity trends of topics related to various technology aspects; (a) topics related to personal media technologies; (b) topics pertaining to signal and image processing; (c) topics related to system development; (d) topics on miscellaneous technologies.
Topics in the evaluation category present an overall increasing trend (Figure 14). Interesting trends to mention include the cost-effectiveness studies which remain of comparatively low popularity, while assessment of intervention is high and increasing (Figure 14(b)). Qualitative assessment studies (Figure 14(c)) are also of increasing popularity, with emphasis on interactive methods such as focus groups and interviews to take over more passive evaluation methods based on questionnaires. Important to note is that approaches evaluating the effect of eHealth interventions in the clinical setting (Figure 14(d)) show rather exponentially increasing trends during the last decade.

Popularity trends of topics related to evaluation issues; (a) general topics pertaining to evaluation; (b) topics on intervention evaluation; (c) topics on qualitative assessment studies; (d) topics on systematic evaluation of intervention effects.
Finally, topics in the regulatory issues' category show rather indifferent, slightly decreasing trends with the exception of technology adoption and barriers which increased considerably during the early 2000's and maintains popularity thereafter (Figure 15).

Popularity trends of topics related to regulatory issues.
Pearson correlation of trends amongst all topics showed that 38 topics were significantly correlated with at least one or more other topics (correlation coefficient >0.9 and p ≪ 0.001). These highly correlated topics are shown in a network graph in Figure 16. Topics are represented by nodes, and correlations are shown as edges between nodes. The colour indicates the node degree, i.e. the number of edges for each node, which varies from a minimum value of one to a maximum value of 18. The network graph reveals three clusters of correlated topics. The top left, standalone cluster corresponds to topics with decreasing trends; the top right cluster corresponds to topics that show an increasing trend with a higher increase after mid 2000's; finally, the large cluster in bottom (connected to the top right cluster) corresponds to topics with increasing trends.

Network graph depicting the 38 topics that show significant trend correlation with at least one of more topics. Topics are depicted as nodes and edges represent correlations. Colours are used to highlight node degree, i.e. number of edges for each node.
Discussion
Probabilistic topics modelling in a corpus of eHealth publications was used to identify major topics in eHealth scientific literature and calculate trends for the last 20 years. The LDA algorithm used in this study has been shown to exhibit the highest performance in topics modelling of textual corpora, albeit of a different nature.17 Stability tuning of the LDA identified a first local stability maximum at 110 topics. Manual topic examination by two experts identified 100 meaningful topics which were organised in nine categories and analysed in terms of trends for the last 20 years.
Overall the analysis indicates a statistically significant increasing trend of eHealth publications compared to the overall PubMed corpus growth. An earlier literature survey on telemedicine publications looking into MEDLINE (the largest subset of PubMed) showed that in 2003 the percentage of total telemedicine papers in the period 1964–2003 was 0.05%.7 Records retrieved in our study for the same period show that eHealth papers are 0.046% of total papers in PubMed for this time period, which confirms prior results. A more recent, comprehensive bibliometric analysis of the field as depicted in PubMed retrieved 17,932 records till the end of 2013.8 The equivalent time span in our search retrieved 18,873 records, i.e. the 62% of the total retrieved records. The higher number of records retrieved in our study is probably due the differences in the query formulation. The query in this previous study8 used terms referring to telemedicine and the specific case of videoconferencing; whereas our query was formulated around generic terms including the entire span of eHealth concepts and excluding terms specific to technology.
Considering general topics, our findings show that the topic telemedicine is in general more abundant than the topic eHealth, however the former shows a steeply declining trend as opposed to eHealth which is increasing during the recent years (Figure 17). This confirms in general the findings of an earlier bibliometric analysis of 11,644 papers retrieved from the Scopus literature indexing database including records up to 2012.30 Although the two topics show a convergent trend in our study, eHealth (as topic) has not yet surpassed telemedicine, as is predicted in the earlier study.30

Trends for the popularity of topics eHealth and telemedicine. Although telemedicine is decreasing in popularity, it is still more popular than eHealth which increases.
Concepts that relate to the medical domain of eHealth application are expressed via two large topic categories pertaining to specific disease or medical specialties. Analysis of these categories shows that chronic diseases demonstrate an increasing popularity from early 2000's onwards, with a tendency to plateau during the last five years; exceptions include hypertension which shows a decline during the last decade and chronic pain which gains popularity. All topics related to acute disease (e.g. acute heart disease, trauma, surgery) show decreasing trends. Cancer and infectious disease, as well as paediatrics and neonatal care show overall low, but steadily increasing popularity during the last 10 years. Noteworthy is an entire category of topics addressing lifestyle and behavioural issues: all related topics show a clear, continuous increase over the last 20 years; especially, the concept of behaviour change seems to gain significant popularity over the last five years. On the contrary, diagnostic laboratory related topics (especially radiological and pathology imaging), although amongst the most popular topics in the previous decades, show a steep decline in recent years. This confirms findings of a previous bibliometric study of 967 papers on telepathology retrieved from PubMed indexing database for the period 1986-2010; the study showed a decline in telepathology after a peak in 2000.31
Analysis of concepts that relate to the service model supported by eHealth shows that remote consultation and health services (including home care) peaked around early 2000's but decline thereafter. However, remote patient monitoring and elders home care peaked around year 2010 and show a slight decline in the last few years. During the last decade, eHealth for developing countries increased significantly, while eHealth for family caregivers has emerged. Hospitalization reduction and clinical decision support (both topics related to clinical services) have a rather low, albeit stable contribution during the last 20 years. On the other hand, support of rural health and especially primary healthcare exhibit a higher popularity.
Focus on aspects of eHealth technology is shifting during the last 20 years. In the beginning of this time period, significant attention is drawn on system integration, videoconferencing and medical imaging, which, however quickly declines in the following years; this confirms similar crude findings for decline in videoconferencing in the period 2009–2013 as opposed to before 1995.8 Standards have drawn attention for about a decade, but popularity is decreasing during the last years. Discussion on security emerged since early 2010's and showed a peak around year 2015. Wearables and sensor devices showed a hype around mid 2000's and are now being surpassed by mobile applications and text messaging which are gaining popularity during the last 5 years. The findings on the mHealth topic trends strongly correlate with findings reported by a recent bibliometric analysis on mHealth literature on 5465 mHealth articles retrieved from SciVerse Scopus literature indexing database for the time span 2006–2016.32 Figure 18 shows comparative trends for mHealth as calculated by our topic modelling study and by the previous bibliometric analysis;32 findings of the two approaches were found to strongly correlate (Pearson correlation coefficient=0.99 with a high significance of p≪0.0001).

Trends for mHealth as depicted by the findings of our study by the topic mHealth and the findings of a bibliometric analysis of mHealth literature (see Table 2 in Sweileh et al.).32 Trends calculated by the two different approaches correlate strongly: Pearson correlation coefficient = 0.99 with a high significance of p ≪ 0.0001.
Evaluation in eHealth is gaining popularity and is shifting from easier to analyse methods (e.g. patient satisfaction, questionnaires and surveys) to approaches that allow for more in-depth understanding (interviews and focus groups). Clinical evaluation studies of eHealth intervention effects (including the more rigorous randomised control trials) show a continually increasing popularity. Overall, eHealth intervention assessment shows a significant, increasing popularity in contrast to cost-effective analysis which, although rising, shows a much less contribution.
A previous crude analysis of term frequencies in a corpus of eHealth abstract for two time spans (before 1995 and between 2009–2013)8 showed shift from diagnostic laboratories in the early days towards metabolic and cardiovascular disease, home and monitoring; the same study also identified a shift from technology related terms towards clinical applications, while cost analysis showed low frequency in the entire time span. The results of the topics modelling presented here agree with all these prior findings and present with detail yearly trends of these topics.
Major limitations of the study include the restriction to the corpus available in the PubMed indexing database and the subjective naming and grouping of the topics. PubMed is an indexing database containing information about articles published in more than 30,000 journals in biomedical sciences. As any other database, there are expected to be errors including incomplete coverage and erroneous entries. This is minimised by excluding records from the last year which are usually incomplete. The query used to retrieve the initial corpus was limited only to general search terms corresponding to the top-level definitions of eHealth. More specific terms corresponding to specific specialties, service models, or technology (e.g. teleradiology, telepathology, telecardiology, teleconsultation, telerehabilitation, Internet, mobile, etc.) where not considered to avoid bias of the corpus which might have resulted in a number of papers being left out. However, this number was minimised by the MeSH term expansion of the query supported by PubMed search engine. Also, another limitation relates to the manual labelling of topics which is objective. Although labelling was performed independently by the two authors and was based on the top 20 words describing each topic, the final one- or two-word label can be rather limiting in conveying the full context of the topic. A better approach would be the use of word clouds; these were computed and used successfully by the authors during topic labelling, however, they are difficult to display in the limited space of this manuscript.
Overall, the eHealth topics analysis presented in this paper shows eHealth is a blossoming research field, which emerged in the middle of the twentieth century to explode during the last two decades. The field is also thriving with a large number of disparate topics; tuning of the LDA in this work showed a first local stability around 110 topics, while further local stability points can be identified around 160 and 250 numbers of topics. Further fine tuning of the LDA approach (using finer increment steps for the number of topics) may identify more local stability points in larger topic numbers, thus leading to a more detailed analysis of the field. Trends analysis shows a continuous shift in focus. Early emphasis on medical image transmission and system integration has been replaced by increased focus on standards, wearables and sensor devices, now giving way to mobile applications, social media and data analytics. Attention on disease is also shifting, from an initial popularity of surgery, trauma and acute heart disease, to the emergence of chronic disease support, and the recent attention to cancer, infectious disease, mental disorders, paediatrics and perinatal care; most interesting the current swift increase in research related to lifestyle and behaviour change. The steady growth of all topics related to assessment and various systematic evaluation techniques indicates a maturing research field that moves towards real world application.
Probabilistic topics and trends analysis of the eHealth research literature shows a multidisciplinary, ever expanding and continually shifting field, with many topics of interest, each of each own evolution. More detailed study within each topic would be required to extract best practices and lessons learnt and reveal technological, socioeconomic, and other strategic factors affecting the shift in research interest and the evolution of research within each area of the eHealth field.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship and/or publication of this article: This work was partly supported by the European Commission FP7-ICT project CARRE (Grant No. 611140) and the corresponding Greek National Matching funds of the General Secretariat of Research and Technology (GSRT). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
