Abstract
Identifying emerging technological topics is of great interest to decision makers, but identifying such technologies usually has two problems. First, an overwhelming amount of information is given to researchers on many subjects. Second, the effect of the topics is usually ignored. This paper describes a new overlap method for locating emerging topics in a specific technology domain. Two models of technological patents, namely, those based on the direct citation and genetic approaches, were combined to identify new and persistent clusters. The method was then applied to the technological domain of solar photovoltaic. Fifteen emerging topics were identified for the period 1980–2010, and these topics were evaluated by the history of innovation and development of solar cells. These topics were characterized in various ways to understand the motive forces behind their emergence. Results indicated that the methodology could be a useful tool for identifying emerging technological topics.
Introduction
The evolution of topics, including emerging topics in science and technology, has been of interest to governments, companies, and individual scientists for a number of years. Research in this area has come in waves. A Web of Science search for “emerging technology” returns over 27000 articles published in 1985–2016, shows the long-term and recent interest in emerging technologies and its prominence as a topic of interest.
However, most of the studies about emerging technologies were retrospective analyses of pre-determined areas but not the methodological studies [1]. And the few methodological studies are often problematic, especially given overwhelming information to researcher and have not taken the impact of topic as a factor into account, which is an important character of emergence.repare your paper in the same style as used in this sample document.
Therefore, this paper proposed a combine technique from a patent citation database for identifying emerging topics that are new and high impact in a specific technological domain, and used that technique to nominated a list of 15 emerging topics from the patents of Solar PV domain. These topics were evaluated by searching for the motive forces behind their emergence. The evidence presented suggests the technique nominates a viable list of emerging topics suitable for inspection by decision makers.
State of the art
Defining emergence
The concept of emergence is “widely used but seldom defined” [1]. Rotolo et al. defined emerging technologies by five attributes: radical novelty, fast growth, coherence, prominent impact and uncertainty and ambiguity [2]. Small et al. regarded novelty and growth as two most important properties associated with emergence [3]. Furukawa et al. claimed that an emerging technology often causes technological discontinuity and has important impacts on existing industries [4]. Comparing these properties with those from other definitions, there is nearly universal agreement on two properties–novelty and hasimpact.
Identifying emergent topics
As mentioned above, despite the long-term and recent interest in emerging technologies, most of the studies about emerging technologies were retrospective analyses of pre-determined areas but not the methodological studies for identifying emerging technologies [1]. While these types of studies could not identify the currently emerging topics, and could not satisfy the requirement of funding bodies and practitioners worldwide [3].
Some researchers have proposed techniques related to emerging technologies. For example, Bass developed the Bass Model to study the new product adoption and have been used in technology forecasting [4]. Altshuller proposed a tool TRIZ to help people understanding how new technologies emerge [5]. Robinson et al. proposed a ten-step approach to generating the forecasts [6]. Yu and Lee [7] used self-organizing maps (SOM) to identify emerging technology forecasting via text mining.
However, these diversity approaches related to emerging technologies were far from straightforward and often needed widespread electronic communication and publication of information [8].
One method to reduce the information load is to focus on the patent analysis. The use of patents as an indicator of innovative activities was rather established in the innovation literature [9]. Though patents have several shortcomings, the increasing importance of patents was widely highlighted by many researchers [9–12]. Patents and citation data were used more and more to identify emerging technologies by researchers.
The co-citation modeling is a main method used by researchers to identify emerging clusters [8]. Clusters of highly co-cited patents that were linked from year to year were used to detect emergence by Small [7]. According to Small, although thresholds and normalizations had changed, the basic process of creating annual co-citation clusters and linking those annual clusters into longitudinal strands or threads had changed very little over the past 35 years [3].
Except co-citation modeling, Daim et al. used bibliometrics and patent analysis to forecast emerging technologies for three technology areas [13]. Eusebi and Silberglitt used the patent classification method to identify and analysis the emerging technologies [14]. Direct citation, the technique at the core of Garfield’s historiography, was also used by Cho and Shih [15] to identify core technologies and emerging technologies in Taiwan. Fallah et al. used forward patent citations as predictive measures for diffusion of emerging technologies [16]. Yuya et al. also used direct citation method to track emerging technologies in energy research [17].
However, there are two obvious shortcomings among most of these studies that have been conducted to develop methods to identify emerging topics by patents. Firstly, most of them focused on creating structure from a data set using co-occurrence or direct clustering method, and looking for novelty within that structure [1], but often ignored the other very important property-emerging topics have important impact on existing industries (i.e., have the character of persistence). Secondly, most of these studies need to search through the mass of patents in different technological domains to locate the interesting emerging technologies to researchers and analysts, thus the sheer scale of information on a particular technological domain often could be overwhelming.
According to Martinelli and Nomaler [11], the concept of persistence patents in a domain was one that could affects future technological development and therefore its contribution persists in the technology. They proposed a genetic model to identify the persistent patents through decomposing patent’s knowledge and calculating the patent’s persistence index.
To identify the emerging clusters in accordance with the demand of novelty and persistence by patents data, this study proposed a new overlap method for identifying emerging topics in a specific domain. The concept of technological domain was defined by Magee et al. [18]. We combined the direct citation model and genetic model to nominate 15 topics as novel and persistent. Evidence was gathered and presented to show that these topics are reasonable in many ways, thus suggested that the methodology produces very useful results.
The remainder of this manuscript is organized as follows. Section 3 describes the overlap methodology. Section 4 nominated emergent topics and their characterization, which is followed by the motive forces associated with those topics. Conclusions are summarized in Section 5.
Methodology
Data
Magee et al. [18] defined a technological domain as “A technologically differentiated field that includes technological artifacts that meet a specific functional requirement or purpose using knowledge of a distinct scientific/technical field” [18]. In order to find the relevant and complete set of patents that represent a particular technological domain, Benson and Magee developed a relatively simple, objective and repeatable method called the Classification Overlap Method (COM) [19, 20]. The methodology consists of using keyword search in US patents since 1976 to locate the most representative international and US patent classes and then determining the overlap of those classes to arrive at the final set of patents.
Figure 1 shows a schematic of the process [19]: Step 1 is pre-searching US issued patent titles and abstracts for the search terms. The input to the COM is simply a set of search terms that can be entered into a text box; Step 2 is ranking the IPC and UPC patent classes that are most representative of the technology; Step 3 is selecting the overlap of the most representative IPC class and UPC class.
The initiating pre-search was done using the patent search tool PatSnap, which searched all U. S. Patents from 1976 to 2013. The final database covers a broad range of technologies, including 28 technological domains. The overall size of the patent sets ranges from 154 (Flywheel Energy Storage) to 149491 (Integrated Circuit Processors) and typically 80–90% of the patents are found relevant to the domain by reading of samples [19, 20].
The Solar PV domain defined by them contains 5203 patents. And the 5203 patents citing 78012 patents in total. 19139 of them are from the Solar PV domain itself, and others are belonged to other technological domains.
Direct citation model
The direct citation model was selected by us to identify the novel clusters. Because the direct citation model has been proved to be much selective than co-citation model. This could be due to their longer durations (i.e., threads) when compared with the co-citation clusters [3]. Patents and citations have been already used for building several types of knowledge networks the nodes of which are firms, inventors [21, 22], or technological classes [23]. Normally, if patent A cites patent B, we can think of a technological relation between the inventions disclosed by the two patents.
In this study, citation links between patents are used to create clusters of patents using the full set of Solar PV in a single clustering process. As mentioned above, clustering method was developed by many researchers [24–26]. Here, the direct citation clustering was done using the VOS methodology and algorithm recently developed by Waltman and Van Eck at Leiden University [26]. This algorithm used a variant of modularity-based clustering, which attempts to maximize the ratio of links within clusters to links between clusters.
Generate the direct citation clusters is simple both in concept and method. Firstly, determining the relatedness of patents. In this step, for each pair of patents in the domain of Solar PV the relatedness is determined. This is done based on the direct citation relations between patents. Secondly, Clustering patents. In this step, the clustering technique VOS method was used. One can directly input the file created in the first step into the VOS software. And using the clustering function in this tool.
Genetic model
The genetic algorithm has been used in a wide range of area [11, 27]. In this study, we identify the patents that have disruptive impacts on existing technologies based on the genetic approach proposed in paper [11]. The paper proposed a method to calculate the persistence index (PI) of each patent. And patents with higher PI were considered to have important impacts on existing patents. The concept and detailed calculative process of this method is shown in Fig. 2.
Figure 2 shows a network that composed of three layers of patents, indicated by TR0, TR1, and TR2. The genetic decomposition of this network is going to quantify how much of the knowledge of each patent in TR0 and TR1 is retained in the endpoints. Firstly, the endpoints (F and G) are identified, and working backwards, each patent is assigned to a layer; Secondly, calculate the persistence index of each startpoint belonging to the first layer (TR0), then the startpoints are deleted and the network is truncated. Thirdly, TR1 became the new layer of startpoints, and calculation of the persistence index for the new group of startpoints and delete this layer. Lastly, repetition of the second and third steps up to the last layer.
The only patent cited by patent D is patent A; thus 100% of the inherited knowledge embodied in patent D is the knowledge of patent A. Instead, patent E makes two citations to patents B and C; thus the inherited knowledge embodied in patent E consists 50% of patent B and 50% of patent C. In the TR2, F makes only one citation and that is directed to patent D. Since the embodied knowledge in patent D is 100% that of patent A, the inherited knowledge embodied in patent F is again 100% that of patent A. The endpoint G makes two citations. The first is to patent D that has 100% patent A knowledge; thus 1/2100% = 50% of the inherited knowledge embodied in patent G is the knowledge of patent A. The second citation of patent E is to patent G that embodies 25% of each of the respective knowledge of patents B and C. Thus the persistence index of A is 1.5 (100% + 50%), and the persistence index of B and C are both 0.25.
Select emerging clusters
Firstly, we identify the technologies met the demand of novelty in the direct citation clusters. As mentioned above, different from co-citation model, Direct citation tends to create clusters that are somewhat broader conceptually and of much longer duration than are the co-citation threads. In part, this is because co-citation threads are only allowed to link sequential years, while many direct citation clusters skip publication years once they have started. We consider the start of a direct citation cluster to be the year at which the cluster has enough critical mass to continue into the future – in other words, we consider a leading tail with only a few patents per year to be a pre-cluster phenomenon. We calculated start dates for clusters to be the first year in which (1) the cluster contains at least 5 patents, and (2) that year is within five standard deviations of the mean publication year for the cluster. Specifically, the approach used here is to check if there have patents in each direct citation cluster that belong to new threads (one or two years old) in the direct citation model for a given year.
Secondly, calculate the average persistence index (API) of each cluster. Then the top 20% clusters with the greatest API in all through years were selected.
Thirdly, select the overlap of the selected two groups of clusters.
Thus, these clusters with both high novelty potential and high API are selected and considered to be the emerging clusters that met the demand of novelty and persistence.
Lastly, labeled the clusters. One can use the VOSviewer to create a map based on the text corpus. The corpus would be extracted from the words of the title (or/and abstract) of patents in each cluster, and create a co-occurrence map and shows the important words based on their frequency in each of the clusters.
Result analysis and discussion
Patent map
We use the methodology introduced in the previous section to construct a classification system based on 5203 patents in the Solar PV database for the period 2001–2010. All patents and their citations were included. The total number of patents based on which the classification system was constructed is 4567 (other 636 patents have no link with others). There are 19139 citation relations between these patents (the 4567 patents and their cluster ID are available from the author:
Figure 3 is the map of the 4567 patents in Solar PV domain of the VOS classification system according to their direct citation relationships. The 4567 patents were distributed into 622 classes by the VOS clustering method and indicated by differentcolors.
Emerging clusters
According to the definition of new threads in Section 2.4, we have identified 63 clusters that in the new threads through all years (the 63 clusters are given in Table 1). Figure 4 is the distribution of 63 clusters in each year (some of the clusters can occur 2 years if they also have the patent published in the 2nd year from the clusters start). According to the definition of PI, the high API of patent clusters indicates that the knowledge of clusters has had a far-reaching impact on development of the technological domain.
In Fig. 4, 63 clusters were distributed into 26 years in 1976 to 2003. 1979 and 1999 were two years of most “Innovation”, there are 8 emerging clusters in these two years respectively. Followed by 2000, 2001 and 2002, there are 7 emerging clusters in these three years respectively. Contrast with these “innovative” years, the early 1990s was lack of “innovation”, the average of emerging clusters in 1990 to 1995 is only 1.5, which is much lower than the average of clusters in early 2000s. Thus, the early 2000s was the creative periods for Solar PV domain.
Persistence of clusters
Then the persistence of each node was calculated by using genetic approach designed by Martinelli and Nomaler [14]. The persistence network of Solar PV has 22 layers in total. Table 2 shows the top 20 patents with greatest persistence index.
After this, we calculate the API for each cluster, and the top 20% clusters with high API were selected. Table 3 shows the full table of top 20% clusters with high API.
Overlap clusters
Table 4 shows the 15 overlap clusters, which were defined as both “novel” and “persistent” clusters in this study. It is interesting to note that 11 of them are within five standard deviations of the publication year of 2000, indicating that the years around 2000 is a peak period for the development of Solar PV technology. This is according to the practical development condition of Solar PV. In the year of 2000, the global Solar PV cell installations were more than 1000 MW and indicated the beginning of the era of solar energy. Table 5 shows the 15 clusters and their contained patents (weight >2).
Each of the 15 emerging clusters has 5 to 9 patents, the average number of patents in each cluster is ∼6, and the API of these 15 clusters is ∼0.0079. Cluster 67, labeled “photoltaic module, nanostructure”, has the highest API (0.0167), indicating that the knowledge contained in this clusters have the biggest impact on recent technologies of Solar PV. Followed by the clusters 416, which is labeled “thermoelectric devices”, has a API of 0.0156. Both of them were occurred in the year of 2002, indicating that the year of 2002 was the most import year for the development of Solar PV domain.
Motive forces
Another way of thinking about the topics is to consider the reason for their emergence and persistence, for example, the development of related science or the innovation in a technology. In addition, outside or exogenous factors might have played a role too [28–30].
Innovation in technology
We can’t help noticing that, the emergency topics have time variances. Before 1990, people focused on studying the photoelectric conversion, the amorphous silicon (a-Si) technology was in the blossom period. Figure 5 shows the cell efficiencies changing over years by NREL, in 1980 to 1990, the main cell style was thin-film cell, in the start of 1980s, the thin-film cell efficiency were very low (<10%). Researchers focused on improving the efficiency of the thin-film cell. In the year of 1976, the RCA lab produced the first amorphous silicon, while the conversion efficiency was lower than 1%. The low manufacturing cost of a-Si and low temperature in manufacturing process make it took hold in the thin-film cell domain. Only after 1980, when the Japanese company Sanyo used the amorphous silicon solar cell to supply power for pocket computers, the amorphous silicon solar cell started to be commonly used in the world.
In 1990s, the emerging topics centered on the cell structures (e.g. electrode structure, tunnel junction, solar array and so on), this coincided with the rapid development of multijunction cells in this period displayed in Fig. 5, and the research was mainly on the two-junction cell.
In 2000s, some new type of cells occurred, for example, the organic cells, silicon heterostructures and the three-junction cells. Our study identified the nanostructured solar cell was an emerging technology in this period. And people continued to extend the application area of Solar PV. The portable wrist device also became hot in the new century.
Relation of science and the emerging technology
Figure 6 is the changing number of total scientific papers about “solar cell” over 28 years (1976–2003). The data were collected from Web of Science. The number of papers had grown from 111 in 1976 to 2193 in 2003.
Contrasted with the subgraph that represents the changing number of emerging clusters in the corresponding years, one could find that, the number of papers in 1976 to 1990 was keep growing, while the emerging clusters were not. This could be understood; it was more easily to innovate in technology domain when a new technology just occurred. Consider this, we removed the data of the earlier years and only took 1990 to 2003 into study. Take each of the changing number series as a vector; we calculate the Pearson correlation coefficient by using matlab. The result is 0.561, and indicates a medium strong relationship between the science development and the emerging technology.
Relation of emerging technology and Solar cell market
Lawless et al. took the social demand as a key factor impact on the technological input [31–35]. Therefore, we searched for the production of Solar cell from WWW and study the changing of production over time. Figure 7 is the global production of Solar Cell in 1990–2010 (The data collected from: http://wenku.baidu.com/link?url=D0WvEaX9VxDvYloBTBRAG7U0BUMsS43MJRhd8wEHahPsU-4W94HKxKvB1Id1SrKb60FFRyp2NIicvDyqv6mmAr655pIJ2WIXRki7joVeUsW).
The production was only 46.5 MW in 1990, but as it grew exponentially, the production was increasing to 15200 MW till 2010 (a factor of ∼326x). We calculate the Pearson correlation coefficient between the number of emerging clusters in 1991 to 2003 and the growth rate of production at same time point. The result is 0.735, shows a quite high correlation between these two series. Thus, we could infer that, the technology market influenced the technology innovation in a certain degree and vice versa.
Conclusion
To find the technological patent clusters not only novel but also persistent, a new overlap method to identify the clusters that met the demand of novel and persistent was proposed, and the direct citation model and genetic approach were used integrated. After apply this method into the technological domain of Solar PV, 15 clusters that met the set criteria of emerging were identified.
The methodology and selection of emerging and influential topics described above obviously required a number of decisions to be made. As an example of a process decision, we used two local models (direct citation and genetic approach) for the identification of emerging and influential topics. On the analysis side, we had to interpret the interplay between science, innovation, and exogenous events when characterizing the emergence of a topic. For further research, the discussion about these choices and their ramifications, as well as general observations and points would be useful.
Our results also document the increasing of technological patents, scientific papers and technical yield. But note that, our results indicated that there was no obvious lead/lag relationship between science and technology. Thus one could not simply infer from the figures (Figs. 6 and 7) that science/ market lead or lag technology or vice versa. The high Pearson correlation coefficients just showed their correlations with each other, and could only be considered as indicating- not establishing- existence of closer links between technology, science and market.
In addition, the direct citation model used here is also limited by some factors for general applicability. For example, we found it might be not suitable to apply this method into small technology domains since the long duration threads might lead to overlook some creative patents. Consider in this way, a more applicable method for the small size domains could be subjected for future scope.
Footnotes
Acknowledgments
The authors thank the financial support provided by the National Natural Science Foundation of China (Grant No. 71571022). The authors also thank the anonymous reviewers for their helpful comments and suggestions.
