Abstract
As the earliest form of writing and historical cultural heritage in China, oracle bone inscriptions carry immeasurable density of historical information and potential for academic research. However, due to the age and preservation environment, the quality of many oracle bone images is severely impaired, resulting in problems such as text edge wear, loss of details, and blurred handwriting, which greatly limits the ability of scholars to accurately interpret the contents of the oracle bones and explore them in depth. In this paper, we propose a generative adversarial network (GAN)-based image enhancement and calibration system, which realizes multi-level information recovery and geometric correction from micro to macro by applying an advanced deep learning model to capture the minute features, delicate textures and original layouts of oracle bone images. The model in this paper consists of two main parts, that is, generator network G and discriminator network D. The task of generator G is to generate images consistent with the real data distribution based on random noise variables and possibly additional condition information (e.g., category labels, attributes, etc.), which should mimic or satisfy as much as possible the features of the real image under the given conditions. The discriminator D, on the other hand, is responsible for receiving the input real image or the fake image generated by the generator and outputs a probability value that reflects the probability that it considers the image to be a real sample, that is, the truthfulness or trustworthiness of the image. The model in this paper achieves the functions of image quality assessment and authenticity judgment, image recovery and optimization, image enhancement and proofreading by means of adversarial training. Experimental evaluations are carried out on two representative oracle image datasets, OBI-100 and OBI-300, and the effectiveness and superiority of this paper’s method in improving the clarity and readability of oracle images, as well as accurately recognizing oracle characters and extracting oracle information, are verified by comparing it with other image enhancement and reweighting methods. The effectiveness and superiority of the method is verified. The method of this paper provides a new technical means for the research and inheritance of oracle bone inscriptions.
Introduction
Among the treasures of human civilization, oracle bone inscriptions, as the earliest form of Chinese writing and an important carrier of historical and cultural heritage, carry an immeasurable density of historical information and potential for academic research.
1
According to incomplete statistics, the number of existing oracle bones has exceeded 150,000, however, a considerable portion of them have been seriously damaged due to the long period of time (dating back to the Yin-Shang period from the 14th century BC to the 11th century BC) and the influence of long-term preservation environment, which has resulted in serious damage to the quality of the images, manifested as serious wear and tear of the text edges, loss of details, and blurring of the handwriting as a whole, which has greatly limited the scholars’ accurate interpretation and in-depth investigation of the contents of the oracle bones.
2
This greatly limits the ability of scholars to accurately interpret and deeply explore the contents of the oracle bones. At present, the existing oracle bone image recognition process in China is specifically shown in Figure 1. According to related research, due to the image quality problem, the recognition rate of more than 30% of the materials in the existing oracle bone databases is limited, which seriously affects the research progress of paleography. Oracle image flow model.
Oracle bone inscriptions, as a unique form of written record from the late Shang Dynasty to the early Western Zhou Dynasty in ancient China, were mainly inscribed on tortoise shells and animal bones, which not only carry a wealth of historical information, but also serve as an important source for the evolution and development of Chinese characters. To date, a huge number of oracle bone images have been discovered, estimated to be more than 150,000 pieces, scattered in museums and private collections around the world, most of which are concentrated in the National Museum of China, the Yinxu Museum in Anyang, Henan Province, and the National Palace Museum in Taiwan, among other institutions. The quality of these oracle bone images varies, ranging from clearly recognizable complete divinations to fragments that are blurred or even badly damaged due to age. They cover a wide range of aspects such as rituals, astronomy, military, agriculture, etc., and provide valuable information for the study of ancient social life, religious beliefs, and the development of writing. With the progress of science and technology, such as the application of high-resolution imaging technology, digital processing and other methods, the protection, arrangement and research of oracle bones have been deepened, making the appearance of this ancient written heritage more clearly presented to modern people.
At present, the existing oracle bone image recognition process in China is specifically shown in Figure 1.
Figure 1 shows the workflow diagram of an oracle image streaming model. First, a wavelet decomposition operation is performed from the starting point to divide the image into two parts: a low-frequency image and a high-frequency image. Then these two parts are processed separately: histogram equalization enhancement is performed on the low-frequency image with the aim of adjusting the image gray level distribution to make it more uniform, thus increasing the overall contrast of the image; threshold denoising is performed on the high-frequency image to remove unnecessary detail noise and maintain image clarity. After that, the processed low-frequency and high-frequency images are fused at the pixel level, and the optimized results are obtained by integrating the characteristics of both. Finally, image reconstruction is then performed to generate the final output image and end the process.
In recent years, with the significant breakthrough of deep learning technology in image analysis and processing, such as the excellent performance of convolutional neural network (CNN) in feature extraction and the wide application of generative adversarial network (GAN) in image restoration and reconstruction, it provides innovative ideas and technical support for solving this problem.
Therefore, how to effectively improve the quality of oracle bone images with the help of modern science and technology, especially the advanced technology in the field of artificial intelligence, especially the deep learning technology, and carry out accurate calibration and enhancement processing for them has become a key bottleneck in the field of information processing of ancient scripts that urgently needs to be broken through. 3 This thesis is dedicated to the development of an oracle bone image enhancement and calibration system based on a deep learning approach, which realizes multi-level information recovery and geometric correction from micro to macro level by using advanced deep learning models to capture the tiny features, delicate textures and original layouts of oracle bone images. We aim to drastically improve the recognition accuracy and clarity of oracle bone images through high-precision feature extraction, refined trace restoration, and rigorous geometric correction, so as to promote the study of oracle bones into a new stage of refinement. 4
It is expected that this method can not only effectively protect and respect the integrity of the original information of historical relics and recover the original writing form of oracle bones to the greatest extent, but also provide more reliable and detailed research materials for the majority of oracle bone researchers and help them to deeply excavate the wisdom of ancient civilization. On this basis, this study will further promote the deepening and expanding of oracle bone research, and is expected to drive the whole ancient Chinese writing research toward the direction of higher precision and broader vision.
Literature review
Bouchard et al. 5 developed Diviner, an oracle bone proofreading assistant, which is a self-supervised learning-based oracle bone collation tool that can intelligently match and compare on a huge amount of oracle bone images to discover undiscovered eclipses and embellishments. The article concludes by summarizing that Diviner has pioneered a new research paradigm of Artificial Intelligence and Human Expert Collaboration (AI + HI) for the field of oracle bone collation, and looks forward to future research directions. There are similarities between oracle and graphics in that both use visual symbols to convey meaning. 6 In addition, machine translation and speech recognition are both technologies that deal with language, and oracle bones are a written form of Shang Dynasty language, so they are also the same kind of problem. What’s more, knowledge mapping is a logical approach to presenting a body of knowledge, which is also very useful for specialized fields like ancient writing research. 7 This long-tailed distribution may cause the model to over-fit high-frequency characters and under-learn low-frequency characters (including variant characters), thus affecting the overall recognition effect. The uneven distribution of data is also reflected in the variation of non-character characteristics such as writing direction, degree of wear and tear, preservation status, etc., which will likewise increase the difficulty of model recognition.
There are two difficulties in research on oracle bones. On the one hand, oracle bones contain a wide variety of complex glyphs, including many rare or low-frequency glyphs that occur only once, and due to the limited number of samples, it may be difficult for a general-purpose recognition model to adequately learn the features of these glyphs during the training process, resulting in a lower recognition accuracy in the face of unknown or rare glyphs.8,9 The phenomenon of variant glyphs is prevalent in oracle scripts, where different writing styles of the same character may vary significantly, which is a challenge for machine learning models that rely on a large number of standard samples for training, as the models need to have enough variant glyph samples to understand and adapt to such diversity and variability. On the other hand, the distribution of the number of different glyphs in an oracle dataset is usually extremely uneven. Some commonly used characters have a large number of examples to learn, while a large number of rare characters have only a small number or even individual examples. On the other hand, the research results on image enhancement and reweighting are more abundant. Fu et al. 10 summarized the development history, classification, evaluation indexes and application areas of image enhancement and reweighting methods, and systematically combs and analyzes the image enhancement and reweighting methods, providing a comprehensive perspective for the subsequent research. Gao et al. 11 introduced a deep learning-based image enhancement and calibration weighting method, which utilizes convolutional neural networks for feature extraction and transformation of images, realizes adaptive adjustment of image quality, content, style and other aspects, and improves the effect and efficiency of image enhancement and calibration weighting. 12 An image enhancement and calibration method based on multi-scale feature fusion is proposed, which realizes the optimization of image details, texture, color and other aspects through the fusion processing of different scale features, and at the same time, takes into account the laws of visual perception of the human eye to improve the quality of image enhancement and calibration. 13 An image enhancement and calibration method based on the attention mechanism is designed, which dynamically assigns feature weights in different regions or at different levels by introducing attention weights, realizing precise control of image structure, content, style, etc., and at the same time, the attention mechanism is used to capture the impact of factors such as target changes or noise interference on image quality.
To sum up, there is a lack of research dedicated to oracle image enhancement and proofreading system is relatively small; therefore, this paper is committed to the development of an oracle image enhancement and proofreading system based on the deep learning method, so as to promote the oracle research into a completely new stage of refinement.
Modeling
Model functions
The core function of the generative adversarial network (GAN)-based image enhancement and correction technique studied in this paper mainly includes two aspects: (1) Image quality assessment and authenticity judgment: the method can effectively analyze the image after suffering from target content changes, noise interference, or other degradation factors, and quantify the authenticity performance or credibility of the image through some evaluation mechanism or loss function.14,15 Specifically, it can be used to determine whether the processed image is as close as possible to the original undamaged image in terms of visual effect, or whether it meets the expected editing, enhancement or restoration standards. (2) Image restoration and optimization: using the generator part of the GAN, the model is able to extract information from the latent space in a learning manner and reconstruct or improve the quality of the original image accordingly.16,17 By learning from a large number of real images, the generator can realize the refinement of low-quality, blurred or damaged images and output clearer, higher-resolution and more detailed images to meet specific application requirements, such as medical image enhancement, old photo restoration or visual art creation scenarios.
Modeling principles
In conditional generative adversarial network (CGAN), the following basic principles are followed. The specific structure of its model is shown in Figure 2. CGAN model structure.
Figure 2 illustrates the basic structure of a conditional generative adversarial network (CGAN). In this framework, CGAN consists of two main components: a generator network G and a discriminator network D. The task of the generator G is to generate an image that is as consistent as possible with the distribution of the real data, with the help of random noise variables as well as additional conditional information (e.g., category labels, attributes, etc.). The role of the discriminator D, on the other hand, is to receive the input real image or the fake image generated by the generator and output a probability value reflecting the probability value of the image it considers to be real. The whole process is a typical zero-sum game, where the generator tries to produce increasingly realistic images, making it impossible for the discriminator to accurately distinguish between real and fake images. At the same time, the discriminator is constantly optimizing its classification capabilities, which ultimately leads to image enhancement and correction. (1) Structural Composition: CGAN consists of two main parts, namely, the generator network G and the discriminator network D. The task of the generator G is to generate images that are consistent with the distribution of the real data according to the random noise variables as well as possible additional conditional information (e.g., category labels, attributes, etc.), which should as much as possible mimic or satisfy the characteristics of the real image under the given conditions.18,19 The discriminator D, on the other hand, is responsible for receiving the input real image or the fake image generated by the generator and outputting a probability value which reflects the probability that it considers the image as a real sample, that is, the authenticity or credibility of the image.
20
Where x is the input image, y is the real image, z is the random noise or latent variable, (2) Adversarial training process: CGAN training is a typical zero-sum game process. The generator tries to learn how to generate more and more realistic images, making it impossible for the discriminator to accurately distinguish between real images and generated images. At the same time, the discriminator is also constantly optimizing its classification ability, in order to accurately distinguish between real and generated images. The two sides are constantly optimizing each other through iterative optimization, which ultimately improves the performance of the whole model. (3) Loss Function Design: The loss function of CGAN usually contains two parts—generator loss function and discriminator loss function. For the discriminator D,
21
the loss function is usually the Binary Cross Entropy Loss (BCEL), which is used to measure its ability to correctly distinguish real images from false ones. In CGAN, the loss function takes into account this condition factor due to the presence of conditional information, which can be formally expressed as
Model implementation
In the generative adversarial network (GAN)-based image processing framework, the entire optimization process can be divided into two closely related steps. First, an image enhancement and calibration phase is performed. In this phase, the generator G generates a series of diverse image versions by simulating different change states of the target object or introducing noise interference. Subsequently, the discriminator D is used to evaluate the realism of these generated images and their consistency with the original data distribution. In this way, the model is able to select the image that is closest to the real-world representation and has the best quality as the final output among the many candidate images. Secondly, in the image recovery and improvement session, the generator G is further trained to improve the quality of its generated images, aiming to produce images with higher clarity and more details that strictly fulfill the requirements of a particular application.23,24 Similarly, the discriminator D plays a key role in this process by evaluating the realism and performance of the generated images to ensure that the selected optimal image is not only visually realistic but also meets the predefined criteria and requirements.
The specific flow of the model implementation in this paper is shown in Figure 3, which is as follows (1) Input an original image x, and a type t of target variations or noise disturbances including brightness, contrast, color, blur, noise, etc. (2) Utilize G to generate a series of images Specific flow of model implementation.
Figure 3 depicts a specific flow of a model implementation. First, starting from the original image x, a series of sub-images are generated by the target change vector; then, a probability mapping is generated based on the desired effect q and probability values; finally, the image with the highest probability is selected as the optimal image.
In delving into the practical application of conditional generative adversarial networks (CGANs) and their image correction methods, it is especially crucial to illustrate their workings and effects through a concrete example. Suppose we are working on using CGAN to improve a set of low-quality images of historical documents that are blurred due to their age, and at the same time want to enhance the image clarity while being able to optimize the images according to the category of the documents (e.g., historical maps, manuscripts, or inscriptions) in a targeted manner.
Imagine this collection of documentary images includes images of different types of oracle bones, with many characters becoming illegible due to natural erosion and wear and tear over time. Our goal is to use the CGAN model to improve the image clarity while maintaining the original features of the ancient artifacts, so that scholars can conduct more accurate character recognition and historical research. In this process, the conditional information “y” will represent different categories of oracle bones, such as ritual records, weather observations, or war chronicles, which will help the model to understand and preserve the unique style and structure of each type of text.
First, a batch of clear and fuzzy oracle bone images are collected as a training set, the clear images are used as the real sample “y,” while the fuzzy images are used to simulate the input noise “z.” At the same time, each image is labeled with the corresponding category label “y.” The CGAN model is constructed, in which the generator G tries to generate oracle images that are close to the real one and meet the specific category characteristics based on the random noise “z” and the conditional label “y.” Meanwhile, the discriminator D learns to distinguish the real image from the synthesized image generated by G. Through repeated iterations, G gradually learns how to reduce the ambiguity of the image and improve the clarity while maintaining the category features. The design of the loss function ensures that G deceives D as much as possible, while D strives to recognize this deception, creating a dynamic equilibrium in which both parties continue to make progress. Taking a fuzzy and illegible sacrificial record oracle bone as an example, after CGAN processing, the resulting image not only improves the overall clarity dramatically, but also faithfully reproduces even the tiny strokes and cracks of the oracle bone, and preserves the arrangement of symbols and writing style unique to sacrificial records. In contrast to the original fuzzy images, the corrected images allow scholars to easily identify specific content about the sacrificial rituals, such as the object and date of sacrifice, and other key information, which greatly facilitates the progress of academic research.
Experimental evaluation
Experimental design
The experiments in this paper are designed to comprehensively evaluate the performance of the oracle image enhancement and calibration weighting method based on GAN technology in practical applications through comparative analysis. The experiment mainly consists of two core tasks. (1) Image enhancement task: The GAN-based method proposed in this paper is compared with existing image enhancement techniques, with the focus on improving the clarity and readability of the oracle images. The evaluation criteria are peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM), which measure the quality difference between the enhanced image and the original image in terms of pixel-level information fidelity and overall structural consistency, respectively.
26
(2) Image calibration task: Compare the performance of different image calibration methods in recognizing characters in oracle bones and effectively extracting oracle bone information, and take the character recognition rate (CRR) and information extraction rate (IER) of the method proposed in this paper as the evaluation benchmarks, the character recognition rate is used to reflect the accuracy of character recognition, and the information extraction rate reflects the effective degree of understanding the content of the oracle bones.
27
In order to ensure the reliability and validity of the experimental results, two representative oracle image datasets, OBI-100 and OBI-300, are chosen, among which, the OBI-100 dataset is provided by AnYang Normal University, which contains 100 oracle images of lower quality with multiple interference factors, and is used to verify the processing capability of the image enhancement and reweighting algorithm under complex situations. The OBI-100 dataset, provided by AnYang Normal University, contains 100 oracle images with low quality and multiple interference factors, and is used to validate the image enhancement and reweighting algorithms in complex situations. The OBI-300 dataset is a collection of high-quality oracle images collected and labeled by the authors themselves, which is suitable for examining the effect of image enhancement.
The experiments were conducted in a hardware environment equipped with Windows 10 operating system, Intel Core i7-9700K processor, 16 GB RAM, and NVIDIA geforce RTX 2070 graphics card, and Python 3.7 programming language and pytorch 1.8 deep learning framework were used for model development and training. With the above exhaustive experimental design and objective evaluation system, the strengths and limitations of the proposed method can be systematically assessed in the field of oracle image processing.28,29
Experimental results
This section shows the experimental results and analysis of this paper’s method with other methods for image enhancement and image proofreading.
Image enhancement
In order to verify the effectiveness of this paper’s method in image enhancement, this paper compares this paper’s method with the following four image enhancement methods:
Original: The original image without any image enhancement processing.
Histogram Equalization (HE): a commonly used image enhancement method that improves the contrast of an image by adjusting the gray scale histogram of the image.
Adaptive Histogram Equalization (AHE): an improved histogram equalization method that can avoid over-enhancement or under-enhancement of the global histogram equalization by dividing the image into several small blocks, performing histogram equalization on each block, and then stitching them together. 30
Retinex-based Image Enhancement (Retinex): an image enhancement method based on the human visual system, which can improve the brightness and detail of an image by decomposing the image into a reflection component and an illumination component, performing histogram equalization on the reflection component and low-pass filtering on the illumination component, and then synthesizing a new image.
PSNR and SSIM values for image enhancement methods.
From Table 1, it can be seen that this paper’s method outperforms the other four methods in both PSNR and SSIM indexes, which indicates that this paper’s method can effectively improve the clarity and readability of the oracle image and make it closer to the target image. Figure 1 shows a set of comparison of the effect of image enhancement, from which the advantages of this paper’s method can be visualized.
Image calibration
In order to verify the effectiveness of this paper’s method in image calibration, this paper compares this paper’s method with the following three image calibration methods:
Template Matching Based Image Reweighting (TM): a traditional image reweighting method that determines the class of a character by matching the image to be recognized with a pre-constructed character template on a pixel-by-pixel basis to find the most similar template.
Feature extraction-based image proofreading (FE): an improved image proofreading method that determines the class of a character by subjecting the image to be recognized to feature extraction, including edge detection, shape description, texture analysis, etc., and then comparing the extracted features with a pre-constructed feature library to find out the most similar features.
Deep learning-based image calibration (DL): a neural network-based image calibration method that determines the category of a character by feeding the image to be recognized into a trained neural network model, such as a convolutional neural network (CNN), and then outputting a probability distribution to find out the category that corresponds to the maximum probability.
CRR and IER values for image calibration method.
From Table 2, it can be seen that this paper’s method is better than the other three methods in both CRR and IER indexes, which indicates that this paper’s method can effectively recognize the oracle characters and extract the oracle information, making it closer to the real oracle content.
Comparison of the results of Experiment III.
Conclusion
In this paper, an image enhancement and calibration system based on generative adversarial networks (GANs) is proposed to address the problem of low quality of oracle bone images, which are difficult to recognize and interpret, and the system can effectively restore the details and clarity of oracle bone images while maintaining their original features and layouts, so as to improve the readability and informativeness of the oracle bones. The main contributions and innovations of this paper are the following: (1) This paper is the first time that GAN is applied to the enhancement and proofreading of oracle bone images, which utilizes its powerful generative ability and adversarial learning mechanism to achieve image conversion from low to high quality while avoiding the problems of distortion and artifacts in the traditional methods. (2) In this paper, a two-branch generator network G is designed for image restoration and optimization, as well as image enhancement and calibration, respectively. By introducing conditional information and attention mechanism, the generator in this paper is able to adaptively adjust its generation strategy and output results according to different input images and target tasks, so as to improve the quality and realism of images. (3) In this paper, a multi-task discriminator network D is employed to simultaneously evaluate the authenticity and quality of images, as well as the recognition of oracle characters and the extraction of information. By introducing the combination of multilayer perceptron (MLP) and convolutional neural network (CNN), the discriminator in this paper is able to efficiently extract both global and local features of the image, thus improving the discriminative ability and information content of the image. Extensive experiments have been conducted on two publicly available oracle image datasets, OBI-100 and OBI-300, and quantitative and qualitative comparisons have been made with other image enhancement and calibration weighting methods, and the results show that the method of this paper has significant advantages and effects in improving the clarity and readability of oracle images, as well as accurately recognizing oracle characters and extracting oracle information. The method in this paper provides a new technical means for the study and inheritance of oracle bones, as well as an effective solution for other similar image enhancement and proofreading problems. Although the current study has successfully applied GAN to the recovery and optimization of oracle bone images, the future can explore how to effectively combine traditional image processing techniques and deep learning methods to complement each other’s strengths, such as the use of morphology and frequency domain analysis to assist in detail recovery.
Footnotes
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Humanities and Social Sciences Research Planning Fund of the Chinese Ministry of Education: “Research on the Corpus Collation and Visual Schema of Traditional Chinese skills Seen in Oracle Bones” (No. 23YJAZH165); National Social Science Foundation Key Project “Research on the Phenomenon of oracle Scraping and Re-engraving in Yinxus and Related Corpus” (No. 22AYY016).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
