Abstract
In film and television production, efficient and precise image processing is vital for achieving realistic visual effects. Therefore, exploring and applying advanced image processing technologies has become an essential method for elevating the production quality of film and television projects. This work investigates the application of artificial intelligence (AI) technology in the processing and production of animated images in film and television scenarios. By comparing the performance of standard Generative Adversarial Network (GAN), DenseNet, and CycleGAN models under different noise conditions, it is found that CycleGAN performs the best in image denoising and detail restoration. Experimental results demonstrate that CycleGAN achieves a Peak Signal-to-noise Ratio (PSNR) of 30.1dB and a Structural Similarity Index Measure (SSIM) of 0.88 under Gaussian noise conditions. Moreover, CycleGAN achieves a PSNR of 29.5dB and an SSIM of 0.85 under salt-and-pepper noise conditions. It outperforms the other models in both conditions. Additionally, CycleGAN’s mean absolute error is significantly lower than that of the other models. This work demonstrates that CycleGAN can more effectively handle complex noise and generate high-quality images under unsupervised learning conditions. These findings provide new directions for future image processing research and offer important references for model selection in practical applications. This work not only offers new perspectives on the development of animation image processing technology but also establishes a theoretical foundation for applying advanced AI techniques in film and television production. Through comparative analysis of various deep learning models, this work highlights the superior performance of CycleGAN under complex noise conditions. This advancement not only drives progress in image processing technology but also provides effective solutions for efficient production and quality enhancement of future film and television works.
Keywords
Introduction
With the development of the Internet, artificial intelligence (AI) has become one of the three cutting-edge technologies at present and has been applied in many fields. It is also applied to the film and television industry, providing new methods and means for film and television creation. It is usually used for script creation, image processing, film and television editing, which saves the creation cost, shortens the creation time, and improves the quality of works [1]. Che uttered that film creation cannot be separated from the support of technology, and films reflect the evolution and progress of science and technology [2]. Han argued how to use digital twins to study Chinese film history from a more diversified outlook, and promote the development of Chinese films in the digital age [3]. Li et al. mentioned that the demand for computer image processing technology (CGI) was higher in the digital era. Long lens rearranges geometric continuous images through virtual imaging technology and synthesizes them later. Virtual reality (VR) realizes film and television production in the scene without cameras [4]. Huang and He stated that 5G technology, such as AI, digitization, mobile transmission, Internet of Things (IoT), and cloud services, demonstrates a strong tension and has a great impact on culture, society, new media, new art, new literature, and cultural and creative industries [5].
For example, in the “Avengers” series, AI technology is extensively used in computer graphics and visual effects to create realistic animated characters and complex scene backgrounds through deep learning models. With AI-assisted image processing techniques, these films not only achieve highly detailed image synthesis but also significantly improve post-production efficiency. According to public data, the adoption of AI technology has shortened production cycles by approximately 30% and reduced production costs by nearly 20%. Additionally, in domestic films like “The Wandering Earth,” AI technology has enabled large-scale disaster scene simulations, bringing the film’s visual effects to an internationally leading level.
This work references the latest research achievements in related fields. For example, it discusses the use of robust quadratic polynomial hyper-chaotic mapping and pixel fusion strategies for efficient image encryption, providing new ideas for the security of film and television images [6]. This work also draws on the application of advanced deep learning models in the automatic detection and recognition of chewable food items [7]. These technologies demonstrate the broad application prospects of deep learning in image processing. Through the integration and innovation of these studies, this work further expands the application scope of AI technology in film and television image processing.
Combining AI with the artistic sensibilities of film and television image creators allows for the restoration and enhancement of digital images. The goal is for AI to learn the aesthetic thoughts, concepts, and processes of artists and then translate them into aesthetic mathematical models. Compared to traditional image processing techniques, the research innovation lies in the application of advanced deep learning models such as Generative Adversarial Network (GAN), DenseNet, and CycleGAN. These models achieve unprecedented results in image enhancement, denoising, restoration, and generation. For instance, GAN can generate high-quality images that are highly similar to the original images through adversarial training between its generator and discriminator; DenseNet improves feature reuse and model training efficiency through dense connections; CycleGAN enables unsupervised image-to-image translation. By combining these technologies, this work significantly enhances the efficiency and quality of film and television image processing and production, offering new models and methods for the future development of the field.
The logical structure of this work is as follows. First, it introduces the application of AI in film and image processing, focusing on the principles and applications of deep learning techniques such as GAN, DenseNet, and CycleGAN in image generation, denoising, and enhancement. Next, it analyzes the performance of different models in handling noisy images and demonstrates their effectiveness through experimental results. Subsequently, the advantages and limitations of the CycleGAN model in processing film images are discussed in detail, particularly its contributions to improving image quality and production efficiency. Finally, this work summarizes the prospects for AI technology in film and image processing and suggests future research directions, aiming to provide valuable insights for technological development in this field. This work is of significant importance as it both enriches the application scenarios of image processing technology in film and television production and provides new insights into exploring the potential of AI in animation image processing. Amid the growing demand for high-quality content in the film industry, the findings of this work offer new technological tools for the industry, driving innovation and development in film production technology. By thoroughly analyzing and comparing the performance of different models, this work confirms CycleGAN’s exceptional performance in image denoising and detail restoration and provides a reference for applying such technologies in more complex scenarios in the future. The main contribution of this work lies in the novel combination of CycleGAN technology with high-performance computing for film image processing, particularly in image denoising and detail recovery under complex noise conditions. This work not only demonstrates the outstanding performance of the CycleGAN model in unsupervised learning but also validates its significant advantages in enhancing image quality and processing efficiency through experimental data. The integration of high-performance computing is a crucial component of this work. This is because it substantially increases the training and inference speed of the model through parallel processing and large-scale data computation. It enables the effective handling of vast amounts of high-resolution image data in the film industry. This work illustrates how to leverage high-performance computing to optimize the efficiency of the CycleGAN model, achieving real-time processing for image denoising and detail recover. This work provides technological support for future applications in larger-scale and more complex image processing scenarios.
Application of AI to image processing and production
AI
AI is a science technology used to simulate human intelligence theory and application systems. It has two different ways of realization on the computer. One is the engineering method, and the other is the simulation method [8]. ANN is a simulation method, and its principle is to simulate some human learning behaviors through the corresponding computer technology, as shown in Fig. 1.
Artificial neural network model.
ANN is composed of the input layer, the transmission layer and the output layer, which are to compress the image. Figure 1 shows that ANN can compress multiple nodes to save the storage space, and improve the transmission speed [9].
In recent years, with the rapid progress of big data and computational power, AI technology has made significant progress, particularly in the field of deep learning. Deep learning, which uses multi-layer neural networks to model complex data, has been successfully applied in various domains such as image recognition, natural language processing, and autonomous driving. In the film and television industry, AI technology is widely used in scriptwriting, image processing, video editing, and special effects production, significantly enhancing creative efficiency and the quality of works. Specifically, deep learning models such as the GAN, Convolutional Neural Network, DenseNet, and CycleGAN have shown outstanding performance in image generation, enhancement, denoising, and transformation, providing powerful tools for film and television image processing.
Traditional image processing methods face several challenges when dealing with large-scale film production:
Long Processing Times: These methods rely on manual or semi-automated techniques, resulting in slow processing speeds that struggle to meet the demands of high-resolution films. High-quality images often require significant computational resources, leading to low processing efficiency. Suboptimal Noise Handling: When confronted with complex noise (such as high-frequency or mixed noise), traditional methods like Gaussian filtering and median filtering can partially reduce noise but often come at the cost of detail loss and image blurriness. Limited Resolution Enhancement: Conventional interpolation methods, such as bilinear and bicubic interpolation, can only improve image resolution to a limited extent, yielding unsatisfactory results that cannot meet the demands of HD or ultra-HD content.
By introducing deep learning models (such as GAN, CycleGAN, and DenseNet), this work overcomes the limitations of traditional methods. These models leverage rich feature representation capabilities, significantly enhancing processing speed and achieving substantial advancements in denoising, image enhancement, and resolution improvement, while ensuring the preservation of image details and overall quality enhancement.
AI needs to construct its own database for artistic creation by learning a great deal of knowledge. GAN is a popular method of creating works of art based on AI, and it can supervise the images generated by CNN and reduce the difference between the generated and the original images [10]. It is composed of two parts, namely the generative network and the confrontation network. The former is to generate data, known as a network generator, and the latter determines whether the data are true as a judge [11]. Figure 2 shows the structure of GAN.
GAN model structure.
Figure 2 indicates that sample
Equation (2) describes the discriminator model after training in a GAN. During the training process of a GAN, the discriminator’s objective is to maximize the log-likelihood function, enabling it to correctly distinguish between real data and generated data. Conversely, the generator’s objective is to minimize this log-likelihood function, making the generated data as close to the real data distribution as possible. After training, the discriminator should be unable to distinguish between generated data and real data, meaning its output is close to 0.5. This indicates that the generator has successfully deceived the discriminator, making the generated data indistinguishable from real data. This model improves the quality of the generated data step by step by optimizing the adversarial process between the generator and the discriminator.
After the discrimination model and the generation model are confronted, the
This work employs GAN for image generation and enhancement. Specifically, the GAN model consists of two main components: the generator and the discriminator. The generator is responsible for creating new images, while the discriminator is used to determine whether an image is real or generated. The generator uses a CNN structure, transforming random noise into high – quality images through a series of convolutional layers, batch normalization layers, and activation function layers. Similarly, the discriminator uses a CNN structure, extracting features and classifying the input images through multiple convolutional and pooling operations. Additionally, DenseNet and CycleGAN technologies are introduced to improve the model’s generation capability and stability.
The model components are as follows:
Generator: Input layer: Random noise vector; Multiple convolutional layers: Feature extraction; Batch normalization layers: Accelerate training; Activation function layers: Introduce non – linearity; Output layer: Generated image.
Discriminator: Input layer: Image; Multiple convolutional layers: Feature extraction; Pooling layers: Reduce parameters; Activation function layers: Introduce non – linearity; Output layer: Classification result (real or generated).
DenseNet is a network with dense connections. It connects the front layer with all the back layers, and has transition sections and dense connection blocks [14]. Figure 3 shows the principle of DenseNet.
DenseNet network structure diagram.
As shown in Fig. 3, DenseNet combines features by connecting each layer in series, and each layer is connected by batch normalization (BN), linear rectification function (Relu) and convolution operation (Conv). Through these dense connections, the network requires fewer parameters than the traditional and achieves feature reuse. It promotes the back propagation of gradient and makes the model easier to train.
CycleGAN is one of the most commonly used ANN, and it can train two generative adversarial networks and exchange data between the two networks [15]. Figure 4 shows the architecture of CycleGAN.
Architecture of CycleGAN.
Figure 4 tells that CycleGAN is a ring model consisting of two symmetric generative adversarial networks. Its image generation is bidirectional, and the two networks share generators
CycleGAN can also generate a new image
The overall objective function is established to narrow the gap between the new and the original image, as shown in Eq. (5).
The network degradation can be handled by the residual network, thereby enhancing the training results of the model [17]. Figure 5 shows the architecture of CycleGAN after the residual network is added.
Generator’s architecture of CycleGAN.
Figure 5 demonstrates that six residual modules are added in the middle of the CycleGAN generator to prevent the network degradation, so that the image is de-shadowed and the gap with the real image is reduced.
“Zhongying
Noise and model configuration
In order to evaluate the performance of the models under different noise conditions, a series of experiments are designed to compare the effects of different configurations. Gaussian noise and salt-and-pepper noise are added to the input images, and the performance of standard GAN, DenseNet, and CycleGAN in handling these noises is tested. In the experiments, Gaussian noise and salt-and-pepper noise are added to the images to test the denoising performance of different models. The specific settings are as follows: For Gaussian noise, the mean is set to 0 with a standard deviation ranging from 0.01 to 0.05, incrementally increasing to simulate different noise intensities. For salt-and-pepper noise, the noise density is set from 0.01 to 0.05, and progressively increases to simulate varying levels of image degradation. Table 1 shows the results.
Noise and model configuration
Noise and model configuration
By comparing the performance of different model configurations in noise removal, image enhancement, and detail restoration, it is found that CycleGAN, under unsupervised learning conditions, can more effectively handle complex noise and generate high-quality images. The experimental results indicate that using DenseNet’s generator and discriminator structure can significantly improve the robustness and image quality of the model.
The model training employs an adversarial training algorithm. The specific steps are as follows:
Data Preparation: A large number of film and television images are obtained from public datasets and preprocessed (such as cropping, scaling, and normalization). This work uses the Flickr30k dataset ( Parameter Initialization: The weights and biases of the generator and discriminator are randomly initialized. Training Loop: A batch of real images and noise vectors are randomly selected. The generator generates fake images, which are input into the discriminator. The discriminator classifies the real and generated images and calculates the loss function. The discriminator’s weights are updated to more accurately distinguish between real and generated images. The generator’s loss function is calculated using the generator’s output and the discriminator’s feedback. The generator’s weights are updated to produce more realistic images. The above process is repeated until the loss function converges.
During training, the Adam optimizer is used, with a learning rate set to 0.0002 and a batch size of 64. During training, each model is set to 100 epochs, with intermediate results saved every 10 epochs for subsequent analysis. To ensure training stability, all models are trained in the same hardware environment using a deep learning framework accelerated by a Graphics Processing Unit. To further enhance model generalization, L2 regularization and Dropout (set at 0.5) are applied to the generator and discriminator, respectively, in addition to using the Adam optimizer to reduce overfitting risk. Moreover, the learning rate is dynamically adjusted (starting from 0.0002 and gradually decaying) to ensure effective convergence in the later stages of training. To prevent model overfitting, early stopping and dropout techniques are also introduced. After several rounds of training, the model’s performance on the validation set reaches the expected results, and the quality of the generated images is significantly improved.
In a high-performance computing environment, preprocessing techniques are essential for ensuring the effectiveness and efficiency of image processing. The main preprocessing steps include:
Denoising: Methods such as Gaussian filtering and bilateral filtering are used to remove random noise from images. Denoising techniques can significantly enhance clarity and visual quality during film production. Image Enhancement: To improve detail representation in film images, enhancement techniques like histogram equalization and Laplacian operator enhancement are commonly employed. These methods increase contrast and highlight important details. Resolution Enhancement: Super-resolution reconstruction techniques are used to upscale low-resolution images to HD or ultra-HD resolutions. By utilizing deep learning models, such as GAN or Convolutional Neural Network, significant improvements in resolution can be achieved while preserving image details.
The integration of these preprocessing steps with high-performance computing enables real-time processing in large-scale image applications, greatly enhancing both the efficiency and quality of film production.
Performance comparison of different models in noise image processing
Experimental data are adopted to further explain the results of image processing. A raw image, the same image with added Gaussian noise, and the image with added salt-and-pepper noise are selected as subjects. Table 2 shows the performance data of different models in processing these noisy images.
Performance comparison of different models in noise image processing
Performance comparison of different models in noise image processing
Table 2 shows that CycleGAN achieves higher PSNR and SSIM values compared to other models when processing images with Gaussian and salt-and-pepper noise. This indicates that CycleGAN performs best in denoising and image restoration. Additionally, the reduced MAE values further demonstrate the superiority of CycleGAN.
In the production of “awesome, China”, “Zhongying-Shensi” is focused on two aspects. One is to enhance the resolution [20], as shown in Fig. 6, and the other is to improve the restoration, including the improvement of the image rate to 2K, the remove of the effect of the original image and the Moore and sawtooth after transformation, image detail filling, and image quality enhancement [21]. The time spent is 26 minutes 45 seconds.
Comparison of image enhancement effects in “awesome, China”.
The lens is de-fielded and improved to 2K resolution and the sawtooth is removed after the transformation, which takes 47 minutes 40 seconds [22], as shown in Fig. 7.
Comparison of image de field processing effect in “awesome, China”.
Figure 7 shows that the original material of “awesome, China” is repaired or de-fielded, so that the image is more vivid and clear. Based on AI, tens of thousands or even millions of frames of images can be processed per day, and the processing speed is remarkable.
The function of “Shensi” in film and television image production is restoration. When “Zhongying
Transformation of TV standard materials into high definition (HD) materials
The Zhongying base is not content with the achievements in the film and is used for TV series, and the experiment of HD 2K and HD 4K transformation of TV standard material is successful. Figure 8 shows the image transformation of an episode of the classic old TV series “Blood Romance”, and the resolution is improved from 720 * 576 to 1920 * 1080, 2K 2048 * 1556, 4K 4096 * 3112. De-field and the improvement of the image are also achieved in improving the resolution [24]. For most of the marked material images, the real details of the images can be improved to 2K resolution through this system, and the processing results and efficiency are significantly better than those of the traditional scoring method.
Comparison of image enhancement processing effect of TV drama “blood romance”.
The advantage of hand-painted color is that it can be accurately colored and the color ratio is beautiful, but the efficiency is low. A skilled image artist who hand-paints a digital image needs several days, which is impossible to color a film. Therefore, the key shots in the film are hand-painted, and AI is used to learn how to color. In Fig. 9, the black and white film “Road Angels” is colored using “Zhongying
Black and white image of “road angel” and color contrast of AI.
Digital film toning is a necessary process in film production. The films that needs to be processed are made to have a similar visual style by analyzing the existing color-tuning software, and AI is used to learn it. In the process, the coloring concept and habit of the colorist is also learned and a more intelligent, convenient and concise color software, plug-in, is developed, which can inspire the imagination and creativity of the director and colorists, and greatly improves the work efficiency. Figure 10 shows the color toning of “Wolf Totem” [26].
AI color style conversion in “Wolf Totem”.
Huia is a prop often used in film and television production. The work of removing Huia from film and TV series is often done manually by VFX technicians in the past, especially the filling of details after Huia in the context of multi-layer complex motion is removed [27]. The large-scale motion background is difficult to cope with and the close-to-fit Huia cannot be filled by other frames by traditional processing methods. Nowadays, the breakthrough in basic research realizes perfect inter-frame continuity and the detail filling is added by AI. Figure 11 shows the removal of Huia in the shooting.
Comparison of AI goes to Huia effect in the intimate scene.
Yar et al. pointed out that using CycleGAN offered significant advantages in image denoising and enhancement [28]. Liu et al. compared the performance of different generative models in image restoration and found that CycleGAN outperformed other models in terms of PSNR and SSIM metrics [29]. These findings are consistent with the results of this work.
First, CycleGAN demonstrates higher PSNR and SSIM values when handling Gaussian noise, indicating its strong capability in smoothing images while maintaining structural integrity. However, when dealing with salt-and-pepper noise, although CycleGAN still outperforms standard GANs and DenseNet, its denoising effectiveness is relatively poorer. This suggests that CycleGAN may require further optimization or integration with other denoising methods to handle high-intensity discrete noise effectively. Additionally, CycleGAN’s lower MAE value reflects its advantage in overall image restoration accuracy, though it also results in some loss of texture details. Particularly in images with rich high-frequency details, this is a factor that needs improvement in future research. Compared to other studies, Gurrola-Ramos et al. used a U-Net-based denoising model, which achieved slightly lower PSNR under the same Gaussian noise conditions [30]. Similarly, Marcos et al. found that a ResNet-based denoising model had lower PSNR when handling salt-and-pepper noise [31]. In contrast, the CycleGAN model used here excels in both PSNR and SSIM metrics, especially in Gaussian noise processing. The low MAE (0.038) also indicates strong precision in overall image restoration, despite some texture detail loss. The findings of this work hold significant potential value for practical applications. CycleGAN’s advantages in denoising and detail restoration can directly enhance visual effects in film and television productions, particularly improving the image quality in old film restoration and high dynamic range scene enhancement. Moreover, AI technology enables film production companies to significantly reduce post-production time and costs while ensuring image quality. In large-scale film or TV production, CycleGAN and similar AI models can replace traditional manual processing, greatly improving efficiency. Although CycleGAN has demonstrated excellent image denoising and detail restoration capabilities, the model still has some limitations. Its high computational resource demand, particularly for high-resolution images, results in increased training and inference time, placing significant demands on computational resources. Additionally, CycleGAN’s performance in handling specific types of noise, such as strong salt-and-pepper noise or high-frequency texture noise, may not meet expectations due to the model’s limited ability to model complex noise in an unsupervised learning setting. Therefore, practical applications should weigh the computational cost against the model’s performance based on specific scenarios.
Conclusions
This work validates the superior performance of the AI-based CycleGAN model in film image processing, enhancement, and restoration under complex noise conditions. CycleGAN effectively removes blemishes and noise from old images, while precisely restoring image details and improving overall resolution. Compared to traditional image processing methods, CycleGAN demonstrates higher efficiency and image quality, particularly excelling in denoising and detail recovery under unsupervised learning conditions. These findings provide robust technical support for digital processing in the film industry and showcase the significant potential of AI technology in enhancing productivity and creative quality. However, CycleGAN faces challenges and limitations in practical applications. For example, its high computational resource consumption may restrict its use in large-scale high-resolution image processing. Additionally, CycleGAN’s performance in handling specific types of noise, such as high-frequency texture noise, has not yet reached optimal levels, suggesting a need for integration with other methods or further optimization of the model structure. Future research could focus on reducing computational costs while improving the model’s ability to handle complex noise, and exploring how to better leverage AI advantages in various application scenarios.
Footnotes
Conflict of interest
The author(s) declared no potential conflicts of interest with respect to the research, author-ship, and/or publication of this article.
Data sharing agreement
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
