Abstract
With the proposal and development of the Material Genome Engineering program, artificial intelligence has played a significant role in accelerating the research and development of new materials. In the field of electrical engineering materials, high-throughput experimental and computational methods provide a huge amount of data. It also poses new challenges to how to manage material data scientifically and efficiently. Database technology has become a hot topic for material scientists and engineers. This paper makes a comprehensive overview of the development, demand analysis and application of database technology in the electrical engineering materials, and discusses the existing problems and the future development trend of the database. Compared with many materials, such as energy materials, catalytic materials, biomedical materials, etc., the electrical material database still has a long way to go in the process of database platform construction, management and operation, and practical application. However, driven by governmental support and market demand, the construction of electrical material database will gradually improve and play an important role in the data-driven new materials researches.
Keywords
Introduction
The discovery and scientific application of new electrical engineering materials strongly drive the revolutionary original innovation of the high-tech content of electric products. The excellent performance of electrical engineering materials in mechanics, electrical and other aspects is closely related to the atomic hierarchy structure and chemical element combination of the material. It is of great significance to explore the structure of materials at the atomic level and the electronic configuration level to explore the physical nature of materials. The construction of material high-throughput research platform is based on the basic concept of material genetic engineering, aiming at changing the research mode, innovating and developing new materials, and revealing the mechanism and law of regulating material properties.
The Materials Genome Project for 2011 (MGI) was proposed by President Barack Obama and was quickly followed by China and countries around the world. Intended to accelerate material innovation through seamless integration and synergy between calculations, experiments, and theory. Material database technology is a key technology to realizing the MGI, and it has a pivotal position in the data-driven mode of accelerating material research and development. Before the MGI was put forward, there was a material database providing data storage, management, and retrieval. Under the concept of the MGI, the role and status of the material database have become more prominent: on the one hand, high-throughput experiment and high-throughput calculation will support the rapid generation of a large number of experimental and simulation data, which requires material big data technology to fully manage it; on the other hand, material database technology can be closely combined with material high-throughput calculation for data mining, and can also guide the material experiment according to the data analysis and prediction results. Under the guidance of the material big data technology, the material database is developing towards an open, automatic, and intelligent trend.
Demand analysis of electrical material database
Significance of material database technology for the research and development of electrical materials
MGI represents a transformative model of material research and development, revealing the key factors affecting material properties. Achieving this goal requires a synergy of high-throughput computing, experiments, and big data technology. Two research methods, bottom-up and top-down, are profoundly connected to material big data technology, supported by experiment and calculation, respectively. First of all, the bottom-up scientific discovery is based on the experiment and then rises to the general law. Material high-throughput experiment is a breakthrough in material preparation and characterization. Material high-throughput experiment not only improves the efficiency of the material experiment but also improves the yield of material data. Secondly, top-down approaches use first principles calculations to simulate and predict material properties from physical theories. As material data generation accelerates, effectively managing and utilizing this data has become crucial, so material database technology is of great significance in the research and development of new materials.
With the development of the MGI, key technologies such as data mining based on materials databases are widely used in electrical materials fields, such as conductor materials, insulating materials, magnetic materials, and energy storage materials. The key properties of these materials are closely related to the interaction of the atoms and electrons of the materials. For example, the band structure of the material reflects the interaction between the atomic and electronic levels of the material and is closely related to the conductivity of the conductor material. The band structure can be calculated using density functional theory (DFT) from the first principles of quantum theory, while high-throughput computing enables DFT calculation to obtain a large amount of band structure data of materials at a desirable time cost. In addition, first principles facilitate the computation of vital data, including formation energy, load density, piezoelectric properties, and density of energy states at the atomic level. Although there is currently no specialized material genetic engineering database for electrical engineering materials, researchers have established many open access and user-friendly material database based on the high-throughput calculation of materials, from which the key properties of electrical materials can be obtained in large quantities. These data can be managed and fully utilized through databases and their key technologies, which will greatly promote the research and development of electrical materials.
Electrical engineering material database requirements and key technology analysis
Electrical engineering material database and big data technology are supporting platforms and key technologies of electrical material genetic engineering. They accumulate data through high-throughput computing and high-throughput experiment, and integrate with data mining technology to serve the design of new electrical materials. However, due to the characteristics of diverse electrical engineering material systems, diverse preparation and detection technologies, and broad application scope of materials, the electrical material data has large type differences and multi-source heterogeneity, and the database is faced with many problems of data unification in the construction process. Thus, identifying effective methods for data collection, storage, and display data has become a key challenge in the genetic engineering database of electrical engineering materials. Additionally, determining the future utilization of these databases is a critical issue.
With the ongoing advancement of industry and the economy, the evolution of high-tech power products requires advanced electrical engineering materials, whose performance in mechanical and electrical domains is closely related to their atomic hierarchy structure and chemical composition. The electrical engineering materials database platform can collect information on various aspects such as material structure, performance, and process parameters. Combined with the material big data technology of material genetic engineering, it enables in-depth exploration of material structure at the atomic level and the essence of material properties at the electronic configuration level. Under the concept of material genetic engineering, these material databases are increasingly integrating with machine learning, first-principle computation, and other material high-throughput computations, evolving into material high-throughput computing platforms. This represents a tangible shift towards data-driven research and development in the field of applied material database.
Features and functions of the electrical engineering material database The public platform of the electrical engineering material database is the basic structure of electrical engineering material system engineering. It should not only become the electronic knowledge treasure house of electrical engineering materials but also provide strong support for the development and design, calculation simulation, evaluation, and characterization of new electrical engineering materials through data mining, integration, and sharing. The characteristics and functions of the public platform of electrical engineering materials database shall include:
It should have the function of data management, including the application of standards and specifications, data collection, evaluation, storage, and integration, to form a complete database platform; It should have the functions of data service, including efficient data retrieval, mining, analysis, release, collaboration, etc., as well as the secondary development function of the database, which can realize the transparent exchange of data between different software applications and systems; It should have the function of sustainable development, including the function of intellectual property protection and data authority management, and ensure the healthy and sustainable development of the database through the development of sharing incentive mechanism; Construction of electrical engineering material database platform
Basic principles for the construction of the public platform of the electrical engineering material database:
Concentric combined with distribution General-purpose and dedicated combination Basic and thematic (professional) combination Civil and military combination The construction content of the database public platform
The public platform of electrical engineering materials database will integrate the existing domestic data resources and the achievements from electrical engineering materials system projects, enhancing incomplete data sets to build a public platform of material database that serves the national scientific and technological innovation and improves the international competitiveness. Establish the standard system of electrical engineering material database platform under the framework of material genetic engineering. This includes standards for data quality assessment standards,,data model standards, metadata standards, and software related to electrical engineering mateials, standards such as interface standards, integration standards, test standards, interoperability standards, data transmission, release, access control and data update standards.
At present, no specialized open-access databases are specifically tailored for electrical engineering materials. However, various existing material databases encompass a range of electrical materials, including conductors, insulators, magnetic and energy storage materials. Consequently,, this section aim to concisely review and list these databases pertinent to electrical materials (see Table 1).
Data comparison of the major material database in the field of electrical engineering materials
Data comparison of the major material database in the field of electrical engineering materials
First principles based on density functional theory (DFT) from quantum physics significantly predict material performance and facilicate accelerated design. With the continuous optimization and development of DFT calculation, high throughput DFT calculations enable the generation of extensive material performance prediction databases under the acceptable time cost. These databases allow for direct searches based on the specific performance criteria and are conducive to integreting with data mining for new materials development. Notable examples of large-scale databases generated using high-throughput DFT calculations include the Materials Project [1], AFLOWLIB [2], and OQMD.
The Materials Project (
The AFLOWLIB Database (aflowlib.org) is a comprehensive repository of material properties derived from high-throughput computational data, developed by Duke University in 2011. It currently contains of over 3,528,653 materials and more than 733,959,824 material properties, including 366,988 band structure data. AFLOWLIB are predominantly generated using AFLOW (Automatic Flow) high-throughput computing software [3].To enhance data retrievel and mining, the AFLOWLIB supports various applications such as AFLOW
The Open Quantum Materials Database (OQMD) (
The comparison between the formation energy predicted by DFT on the OQMD platform and the experimental values is presented. The black line represents perfect agreement between the two, while the thick red dashed line indicates the difference of one standard deviation between the predicted and experimental valuesï¼with the majority of data points falling within this range. The average difference is represented by the solid red line [6].
Beyond the aforementioned databases, there are other computational resources like the Computational Materials Repository(CMR) [10]. CMR is a modular, open-source computational materials database system that relies on free software products and technologies such as Python, Apache/PHP, and MySQL. This setup allows for the versatile, modular, and scalable collection, storage, dissemination, and analysis of computational material data.
Contrasting with computational material databases, traditional material databases primarily rely on experimental data or journal literature, such as ICSD database (
The Pauling File database (
The Atomly material database boasts world-class status and independent intellectual property rights. Initiated in 2018 and officially launched in 2020, it currently encompasses over 300,000 high-quality material data entries, reflecting a global standard in both scale and quality. The data primarily derive from the Atomly team’s high-throughput software and the cutting-edge first principles methods, processed on high-performance computers.
The Material Genetic Engineering Database (MGED) (
Efficient thermoelectric materials hold significant promise in energy harvesting and solid-state refrigeration devices. Enhancing these materials necessitates exploring the expansive unknown compound composition space, utilizing data mining from databases of known materials. In this realm, researchers strive to establish thermoelectric materials databases to forster data-driven development models. For instance, Gaultois et al. [13] compiled a thermoelectric materials database from over 18,000 compounds reported across more than 100 published literature, introducing a unique market concentration ratio measurement index to addresselement scarcity and supply risks. Similarly, Spark et al. [14] employed data mining and machine learning to analyze the thermoelectric properties of thousands of compounds, integrating these with DFT calculations to predict low thermal conductivity phases in unexplored ternary phase diagrams. Additionally, Chen et al. [15] engaged in data mining within the Material Project database to create a new material thermoelectric material energy database, whose accuracy was corroborated by existing experimental data, aiding the discovery of novel thermoelectric materials.
Designing new energy storage materials involves integrating various material properties, necessitating support from material big data technology. Qu et al. [16] harnessed the Material Project database to develop a molecular property calculation program, particularly for large-scale automated screening of battery electrolytes. This program calculated the ionization potential and electronic affinity of 4,830 molecules with a high degree of accuracy. Standard validations confirmed that the program yields reliable molecular property data, thereby expediting the design and optimization of electrolytes and furthering fundamental electrolyte research.
In lithium-ion battery design, selecting appropriate lithium salts and additives necessitates screening multiple electrolyte materials. Hall et al. [17] employed VAMP semi-empirical electronic structure software to perform PM3 semi-empirical Hamiltonian calculations, geometric optimization of neutral and charged species, and single point energy calculation sequence to obtain energy and affinity properties. Utilizing the Pipeline Pilot workflow, they generated a anode SEI additional structure library and automatically perform quantum chemical analysis, which contains 7,381 unique stereochemical structures. The approch involves creating a material structure library and employing high-throughput computing technology to analyze the data, calculating battery performance metrics and material reaction energies. This virtual material data, a product of extensive calculations, facilitates data mining, assisting in the optimal design and selection of chemical components. The workflow diagram is shown in Fig. 2 [17]. This innovative screening tool, integrating first principles and computational database methods, advances the development of electrolyte additives for lithium-ion batteries.
A materials discovery and optimization scheme is employed, utilizing high-throughput quantum chemistry analysis and virtual material screening [17].
Lithium-ion batteries’ high energy density, reliability, and fast charging capabilities have led to their widespread adoption across various sectors, becoming an pivotal energy storage technology in modern life and industrial development. Sendek et al. [18] leveraged the Material Project database to propose a new large-scale calculation screening method for electrolyte materials, that could meet battery performance standards, specifically targeting all known lithium-containing compounds. There may be potentially excellent conductors in the tens of thousands of compounds containing lithium. By analyzing electronic structure data in the Material Project database, researchers start from known materials that meet the requirements and use experimental data to train a machine learning model for ion conductivity classification, enabling the model to search for electrolyte materials that have not yet been applied but may meet the application requirements on a large scale in the database. The screening efficiency of this model is significantly improved compared to DFT or experimental methods, as it can screen over 12,000 potential lithium-ion electrolyte materials within a few minutes. This method is shown in the flowchart in Fig. 3. By screening materials in the Material Project database, the potential lithium ion electrolyte materials have been reduced from 12,831 to approximately 3,000, in order to meet the requirements of battery technology such as high energy density, high stability, and high cost advantages. Subsequently, 40 crystal structures were selected from ICSD as a training set of machine learning model, the data-driven ionic conductivity classification model is built to determine which candidate structures may exhibit fast lithium conductivity using logistic regression. After screening to reduce the list of candidates to 21 structures, these solid ionic electrolyte materials show promise as electrolytes. In the material screening engineering, the researchers eliminated 92.2% of lithium materials through structural stability, chemical stability, and low electron conductivity, while the screening of high ionic conductivity eliminated 93.3% of the residual materials, showing the high efficiency of machine learning models in large-scale material screening.
A schematic diagram of the workflow for structure screening and model training [18].
Identifying a fast lithium-ion conductor for solid electrolytes is crustial for enhancing the safety of next generation lithium batteries. Lithium ion conductivity and inhibitory electronic conductivity are key properties of lithium-ion batteries. Xiao et al. [19]utilized high-throughput technology to calculate the bond valence of materials, analyzing lithium containing compounds from the ICSD database to pinpoint potential fast lithium ion conductors. The researchers determined the potential structure of lithium ionic compounds using the bond valence method and DFT calculation, subsequently analyzing their dynamic properties and electronic structures.
Improving the stability of electrolytes at high temperatures can help improve the safety performance of batteries. Through molecular electronic structure theory methods, better organic solvent molecules suitable as battery electrolytes can be identified. Korth et al. [20] focused on discovering polar aprotic organic solvents with greater electrochemical stability than vinyl carbonate. The researchers developed a calculation method for systematic and large-scale screening of electrolyte composition. Based on 100,000 data entries in the existing database, they pinpointed 83 candidate molecules using over 46,000 DFT calculations. This method has proven effective in categorizing compounds considered novel electrolyte solvents in the past decades.
Choosing electrode materials with excellent performance can improve the capacity of lithium-ion batteries. Kirklin et al. [21] combined DFT with large canonical linear programming (GCLP), and proposed a powerful method for automatic analysis of ground state thermodynamics. The researchers calculated multiple thermodynamically stable lithiation reactions, including all possible thermodynamically stable ternary conversion reactions of these transition metal compounds, and calculated the reaction potential, volume expansion, and capacity of each reaction. The calculations of the reactions were based on 291 compounds from the DFT database, including all transition metal silicides, phosphates, and stannides found in the ICSD database. Candidates with excellent anode characteristics were selected according to weight capacity, volume capacity, cell potential, and volume expansion rate. Through this high-throughput calculation method, the researchers had identified several anode materials with potential excellent properties, including CoSi2, TiP, and NiSi2, all showing significantly better performance than graphite carbon.
Heusler structures are widely used in various applications, such as for superconductors, thermoelectric, shape memory alloys, etc. The space of available structures is large, and the discovery of new multicomponent crystal materials is a complex task. Kim et al. [22] demonstrated a method to significantly accelerate material discovery by using a machine learning model trained on DFT data from OQMD. Researchers use the entire OQMD database for training, which is a huge training dataset, including a wide variety of structure types, not just Heusler structure. Remarkably, the model’s accuracy improved with the inclusion of both more quaternary Heusler and diverse structural data in the training set. The researchers used the random forest algorithm proposed by Breiman because it is robust to overfitting and is able to train using large datasets. The investigators based on the prediction results of machine learning models to search for novel stable quaternary Heusler compounds. Out of 2,278 components screened, 961 from the OQMD were analyzed for stability via DFT, excluding those with rare earth elements due to their limited exploration. Of the 303 non-rare earth compounds investigated, 55 candidates quaternary Heusler compounds were fond to be stable. This study’s screening process, depicted in Fig. 4 [22], illustrates the potantial of combining OQMD and machine learning to quickly identify new stable materials across various compounds.
The screening process for quaternary Heusler compounds [22].
Material genetic engineering, as an innovative paradigm in material research and development, has effectively accelerated the process of materials from research to practical application, curtailing both time and costs involved in research and development, and yielding substantial achievements, especially in the field of electrical engineering materials. Concurrently, the high-throughput technology underpinning this approach generates extensive data for electrical material research, posing challenges for efficient and comprehensive data management and utilization. The establishment of relevant databases for electrical engineering materials research will effectively accelerate the exploration and development of new high-performance electrical materials and will lay the foundation for the introduction of artificial intelligence technologies such as machine learning and data mining in the field of electrical materials research.
Establish the electrical material database platform. Data forms the cornerstone of material database construction. Since the inception of “material genetic engineering”, there has been a surge in material data, necessitating the urgent development of a database platform for electrical engineering materials characterized by high-quality data, unified standards, and complete types. Electrical engineering material data is mainly obtained through experiment and calculation, which can be quickly obtained from the literature through text mining technology and natural language processing. In the process of data acquisition, it is necessary to pay attention to the differences caused by different process conditions, simulation parameters, and measurement instruments, and make clear records. Establish a safe and efficient database sharing mechanism. In the era of big data, many scientific research institutions and enterprises maintain proprietary databases. However, due to the limitations of industry competition, privacy security, and other restrictions, these entities are reluctant to share their data, leading to “data islands” and significant challenges in data integration.. To address this, federated learning, an innovative distributed machine learning approach, offers a solution. It allows for the sharing of model parameters with third-party cloud servers without compromising data privacy, enabling secure data collaboration. This mechanism can be applied not only for establishing databases but also for machine learning training, and other collaborative endeavors. Establish a mechanism for sustainable development of the database management. Due to concerns about data security and unequal contributions and benefits, scientific institutions and enterprises often hesitate to participate in shared databases.. After the establishment of the database platform, it is necessary to formulate a complete system of operation, maintenance and development to give full play to the sustainable development function of the database, including intellectual property protection, authority management, sharing incentive mechanism, etc. Government should enact robust laws and promote a comprehensive intellectual property framework for material databases, integrating advanced privacy protection technologies and appointing experts for database authority management and maintenance. Additionaly, exploring sharing incentive mechanism, optimizing resource distribution, and advocating a “more work, more gain” approach are vital. Ensuring safe, efficient and transparent access of multi-source data, supporting data scalablility in the specialized electrical materials database, and facilitating differential data haring wil contribute to the database’s, sustainable development. Strengthen the cooperation among high-throughput computing, high-throughput experiment, and database platforms. Material genetic engineering is to shorten the research and development cycle and reduce costs through the cooperation among the three platforms of high throughout computational technology, experimental technology, and database. The three platforms complement each other and develop together. In the field of computation and experiment, there are some difficulties, such as the core computing software needs to be innovated and developed, and the lack of advanced characterization instruments. Only by improving the development of high-throughput computing and experimental technology, and breaking through the bottleneck of related technologies, can we enrich the information of high-performance electrical materials sharing database; At the same time, the material database will in turn provide data guidance for high-throughput computing and experiments. Strengthening cooperation and sharing across platforms is essential to fully realize the potential of material genetic engineering. Promote the establishment of the data-driven electrical engineering material research and development model based on the electrical material database. The overall goal of the electrical materials database platform is to build a public database platform, which is oriented by application requirements and effectively serves the implementation. The database of electrical materials cannot exist in isolation after it is built. It needs to be closely combined with other technologies of material genetic engineering to truly serve to solve practical problems. For example, combined with machine learning and other technologies to establish and predict the electrical material model, accelerate the development of high-performance electrical new materials and new technology. Through the efficient application of database and data mining technology, demonstration application and interactive, multi-level, and multi-objective material experiment and simulation collaborative research, demonstration application in material composition design, process optimization, performance prediction, and other processes, lay the foundation for the engineering application of electrical material data. In addition, future development directions also include: Integration with High-Throughput Technology: Expect an increase in data from high-throughput computational and experimental methods, demanding more efficient database management. Machine Learning and AI Integration: Advanced machine learning and AI will increasingly be used to predict and optimize material properties, making databases more dynamic and intelligent. Development of Open and Automated Databases: Future databases will likely be more open, automated, and intelligent, with user-friendly interfaces and improved data sharing and analysis features.
This paper makes a comprehensive summary of the development status, application cases, and demand analysis of electrical engineering material database at home and abroad, and prospects for the future development direction of electrical engineering material database. Through summary and analysis, as a new research and development technology, material genetic engineering has achieved some results in reducing the research and development cycle of new materials by half and the research and development cost by half. As one of the three key technologies of material gene engineering, database technology is an essential part of the process of accelerating the design and development of new materials. Developing electrical engineering material database technology is help to accelerate the discovery of domestic high-performance electrical materials and alleviate the contradiction between the development of electrical equipment and the research and development of high-end electrical materials. More and more experts and scholars began to realize the importance of the database, and participate in the relevant research, the development of electrical engineering material database technology has a broad prospect.
Author contributions
PS. conceived the idea and participated in the paper writing; SL. participated in the research design and discussion; LX. and HB. participated in the data collection. H.L. and QX. were involved in the analyses of data; BW. organized the research project. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the Technology project of State Grid Smart Grid Research Institute Co., LTD (525500200052).
Footnotes
Conflict of interest
The authors declare no conflict of interest.
