Abstract
As deep learning technologies develop rapidly, the increase of image quality and variety has become the focus of computer vision research. The paper proposes an image production approach based on deep learning (DL) long short-term memory (LSTM) network to extract richer semantic data by leveraging the long-term memory capacity of LSTM network to enhance image generating effect. The generation model of adding LSTM as encoder is studied and designed. The experiment is conducted on a computer equipped with a high-speed Graphics Processing Unit (GPU) and a large memory, using a dataset containing multi-class images. Taking the random forest network model as the control group, this paper evaluates the peak signal-to-noise ratio (PSNR), structural similarity, and generation accuracy. The findings demonstrate that the proposed model’s PSNR and structural similarity are superior to those of the control group. The PSNR reaches 20.5 dB, and the structural similarity reaches 0.8 after 100 iterations. After 600 cycles, the image generation accuracy reached 99%, which is also higher than that of the control group. This demonstrates how well the LSTM-based DL model can extract an image’s long-term semantic information and produce images that are both higher quality and more accurate. This paper is an invaluable resource for refining the DL-based image creation technique.
Keywords
Introduction
Research background and motivations
One of the most significant areas of study in the science of computer vision has always been image formation.1,2 A major advancement in the task of picture production has been realized by the DL model, thanks to the quick development of DL technology. The LSTM network is a DL model that has proven to be incredibly effective in time series analysis and natural language processing, capable of handling sequence data. Based on these backgrounds, this paper aims to explore automatic image generation (IG) methods based on DL LSTM networks to benefit the quality and diversity.3–5
Generative Adversarial Networks (GANs) have achieved great success in IG. However, the traditional GANs model still has some challenges in generating high-quality images.6–8 One of them is the balance between the authenticity and diversity of generated images. In order to generate realistic images, models tend to generate samples similar to training data, resulting in insufficient diversity of generated images. As a neural network that can remember long-term dependent information, LSTM model has certain advantages of generating diversity, which is expected to improve this problem in IG.9,10
Furthermore, the majority of currently used picture production techniques are predicated on the encoding-decoding structure, wherein the input image is mapped by the encoder into a vector representation in the potential space, and this vector representation is then mapped back to the original image space by the decoder.11,12 Nonetheless, many complicated image contents and architectures are frequently restricted by the classic encoder-decoder structure.13–15 The paper addresses this issue by introducing the LSTM model as an encoder component. By extracting richer semantic characteristics through the learning of long-term dependencies within the image, the capacity to generate images is improved.
Research objectives
In light of the above context and inspiration, the purpose of this paper is to present a DL LSTM network-based image production technique and achieve varied and high-quality IG by merging LSTM’s memory capacity with GAN’s benefits. Based on LSTM, the paper implements a novel image generating model. In many cases, the encoder in the conventional encoding-decoding structure is unable to accurately capture the image’s long-term reliance. In order to extract deeper semantic characteristics from the image, this paper incorporates the LSTM model into the encoder. The research findings have significant theoretical and practical significance in the field of computer vision and strongly promote the development of a more potent picture creation model.
Literature review
The theme of this paper is automatic IG based on DL LSTM network. In related research fields, LSTM network, as an important DL model, has shown excellent performance in many application fields. Kwak et al. (2020) 16 studied the potential of crop classification using bidirectional LSTM networks. The research results showed that this method could effectively classify crops. Nabati et al. (2020) 17 studied the generation of video subtitles using enhanced and parallel LSTM networks. The research method improved the performance of video subtitle generation by introducing enhancement and parallel mechanism into the model. Hao et al. (2020) 18 studied the fault diagnosis method of multi-sensor bearing under one-dimensional convolution LSTM network. The research results suggested that this method could effectively diagnose bearing faults. Haputhanthri et al. (2021) 19 studied the prediction of solar irradiance in virtual power plants using multi-modal LSTM networks. The research method improved the accuracy of solar irradiance prediction by using various sensor data. Zhang et al. (2021) 20 studied the weather radar echo prediction method based on convolutional neural network (CNN) and LSTM network for sustainable electronic agriculture. In this paper, CNN and LSTM network models are used to predict weather radar echo. The research results showed that this method effectively predicted the weather radar echo and provided support for sustainable e-agriculture. Lindemann et al. (2021) 21 summarized the application of LSTM networks in time series prediction. The characteristics and application of LSTM network model in time series prediction were summarized. The research results showed that LSTM networks had a good effect in time series prediction.
The above research shows the importance and wide application of LSTM network in the field of DL. By using the memory and long-term dependence of LSTM, researchers have made remarkable progress in image automatic generation, classification, subtitle generation, and fault diagnosis.22–25 It is reasonable to assume that LSTM and its associated applications will open up additional possibilities and opportunities given the ongoing advancements and breakthroughs in DL technology.26–28
Research model
Comprehensive training and analysis of IG model
Large-scale real picture datasets are gathered, and rigorous screening and pretreatment are performed to guarantee the training data and remove noise and needless variation.29–31 This is the dataset preparation and administration component. The LSTM-based picture generating model and data coding are crucial components of the design. By introducing the LSTM model into the encoder, people can make full use of its long-term memory ability, capture the long-term dependence in the image, and extract richer semantic features. The quality and variety of the created images are improved by this architecture, which also benefits the image generating model in the content and structure of complicated images. The method of GANs is used during the model’s training procedure. The model is motivated to learn and provide authentic and varied images as a result of the confrontation training between generator and discriminator. To achieve the optimal generating effect, the training of the generator and discriminator is guided by an appropriate loss function and optimization method.32–35
Design of data coding and IG model based on LSTM
This paper focuses on the design of data coding and IG model based on DL LSTM. By introducing LSTM as a part of the encoder, the purpose is to improve the understanding ability of the IG model for complex image content and structure to realize high-quality and diverse IG.36–39 The core of the research design is to embed the LSTM model into the encoder to make full use of its long-term memory ability. The generator and discriminator compete with one another during model training. This gradually boosts the generator’s ability to produce real images and also improves the discriminator’s ability to distinguish between real and created images.40,41 This competitive and cooperative technique encourages model learning to produce authentic and diverse visuals. The structure of the LSTM-based data coding and picture creation model is sorted out, and Figure 1 depicts the model’s structure. Structure frame diagram of data coding and IG model based on LSTM.
Experimental design and performance evaluation
Experimental materials
In the experiment, an extensive image dataset is used as the experimental material to ensure the reliability and generalization ability. This dataset contains different categories and themes, covering natural scenery, portraits, animals, objects, and other fields. Choose this diverse dataset to test the IG ability of the model in different scenes and contents. The image is pre-processed to guarantee the quality and uniformity of datasets. 42 In order to assist the training, fine-tuning, and assessment of the model, the training set, verification set, and test set are separated from the data collection. The model’s parameter learning and training are done on the training set, while its optimization and hyperparameter selection are done on the verification set, and its performance is assessed and contrasted on the test set.
Experimental environment
Specific settings of experimental environment.
Parameter setting
Specific experimental parameters of the LSTM network model used in the experiment.
Choosing and setting these parameters reasonably can train effectively and generate high-quality images. The selection of parameters needs to be adjusted according to the characteristics of specific tasks and datasets to achieve the best experimental results.
Performance evaluation
The performance of DLSTM model is compared with that of traditional automatic IG model based on random forest network (RFAGM) to study proposed Automatic IG model based on DLSTM. From PSNR, Structural Similarity Index (SSIM), IG Accuracy, and IG Speed, performance evaluation is made in terms of authenticity index of IG and Overlap index of IG. Figure 2 shows the data change trend of different types of IG models in terms of PSNR, while Figure 3 shows the data change trend of these models in terms of SSIM. These charts can be used to compare and evaluate the IG ability of different models. Variation trend of PSNR data of different types of IG models. Changing trend of SSIM data of different types of IG models.

Figure 2 demonstrates that the PSNR of the two models grows as the number of iterations increases, demonstrating an ever-improving level of image creation quality. But the PSNR of the DLSTM model is consistently higher than the PSNR of the RFAGM model, demonstrating that the DLSTM model can provide images with more accuracy than the RFAGM model. In every iteration, the DLSTM model is superior to the RFAGM model, and both models get better at creating images as the number of iterations rises.
Figure 3 shows that the SSIM of the DLSTM model is higher than that of the RFAGM model in all iterations, which shows that the DLSTM model can keep the structural information of the image better than the RFAGM model. Specifically, when the number of iterations is 600, the SSIM of the DLSTM model reaches the maximum value of 1.2, and that of the RFAGM model also reaches the maximum value of 0.8. Therefore, the figure shows the structural similarity comparison between DLSTM and RFAGM under different iterations. In addition, Figure 4 shows the variation curves of IG accuracy data of different types of IG models, while Figure 5 shows the variation curves of IG speed data of these models. These charts can be used to compare and evaluate the accuracy and speed performance of different models in IG. Variation curves of IG accuracy data of different types of IG models. Variation curve of IG speed data of different types of IG models.

Figure 4 shows that the generation accuracy of DLSTM model is higher than that of the RFAGM model in all iterations, which shows that the DLSTM model can generate images more effectively than the RFAGM model. Specifically, when the number of iterations is 600, the generation accuracy of the DLSTM model reaches the highest value, and the generation accuracy of the RFAGM model also reaches the highest value. Therefore, by comparing the generation accuracy of DLSTM and RFAGM in different iterations, it shows that the DLSTM model is significantly better than the RFAGM model in all iterations.
Figure 5 shows the comparison of the generation speed of DLSTM and RFAGM under different iterations. The DLSTM model is slower than the RFAGM model in all iterations. As the number of iterations increases, both models stabilize. However, the two models are different in the quality of generation. It can be seen that both DLSTM and RFAGM are IG models based on DL and Recurrent Neural Network (RNN). They all use the memory ability of RNN and the structure of LSTM to capture the time sequence information and structural information in images. In addition, Figure 6 shows the data change trend of different types of IG models in terms of the truth index of IG, while Figure 7 shows the data change trend of these models in terms of the overlap index of IG. Change trend of reality index data of IG based on different types of IG models. Change trend of overlap index data of IG in different types of IG models.

In Figure 6, the authenticity indices of both DLSTM and RFAGM models fluctuate with the increase of iteration times, but the fluctuation range of the DLSTM model is larger and that of RFAGM model is smaller. This shows that the DLSTM model is more susceptible to the number of iterations than the RFAGM model, and the RFAGM model is more stable than the DLSTM model. The authenticity index of DLSTM and RFAGM models shows no obvious growth or decline trend, but a periodic fluctuation. This may mean that both models have certain limitations and uncertainties when generating images.
In Figure 7, the overlap index of DLSTM and RFAGM models increases with the increase of iteration times, but the rising speed of the RFAGM model is faster, and it is higher than that of the DLSTM model in all iterations. When the number of iterations is 200, the overlap index of the DLSTM model is 0.404, while that of the RFAGM model is 0.494, with a difference of 0.09. This gap may mean that the DLSTM model still has some noise or distortion when generating images, resulting in a small intersection area between the generated images and the original images. However, RFAGM model may be clearer and smoother when generating images, resulting in a larger intersection area. Therefore, the RFAGM model is superior to the DLSTM model in all iterations, and both models improve the similarity between the generated image and the original image with the increase of iterations.
Discussion
Comparing this research with previous research, Singh et al. (2021) 45 reviewed the method of medical IG using GAN. The method and application of medical IG using GAN are summarized. The research results showed that the GAN had potential application value in medical IG. Chlap et al. (2021) 46 reviewed the data enhancement technology of DL applied to medical images. The research summarized various technologies and methods of DL in medical image data enhancement. The research results showed that these data enhancement techniques can improve the performance of DL in medical image analysis. Loverdos et al. (2022) 47 investigated automatic image segmentation and crack detection for brick walls using machine learning. The research results showed that this method can effectively detect cracks in brick walls, which provided convenience for maintenance and repair. Tsuneki (2022) 48 used DL algorithm to analyze and classify gesture images, and realized the function of gesture recognition. The research results showed that the DL method has good performance in gesture recognition and can be widely used in intelligent interaction, virtual reality, and other fields. Chun et al. (2022) 49 studied the application of natural language processing technology based on DL in text sentiment analysis. The research results showed that the natural language processing technology based on DL had a good effect in text emotion analysis. Barrera et al. (2023) 50 reviewed the application of DL in image semantic segmentation. The research results showed that DL has achieved remarkable results in image semantic segmentation and has been widely used in many application fields.
The above research and previous related work show that DL has shown remarkable research progress and broad application potential in many fields such as medical IG, data enhancement, image segmentation, gesture recognition, text sentiment analysis, and image semantic segmentation. The research results show that these methods are of great reference value for improving the accuracy of automatic IG.
Conclusion
Research contribution
This paper’s primary contribution is the innovative LSTM-based image creation technique it suggests. The innovation of the research lies in introducing the LSTM model into the encoder to make full use of its long-term memory ability and extract the semantic information of the image. The experimental results demonstrate that the model in the paper can produce images with higher quality and more accuracy when compared to the conventional random forest network model. The research shows that the PSNR and structural similarity of the model based on LSTM are better than those of the traditional model, and the accuracy is also significantly improved. This offers a valuable point of reference for enhancing the DL-based image generating algorithm going forward. This report is novel in that it establishes the impact of LSTM on IG tasks, hence providing a solid basis for future LSTM-based IG studies.
Future works and research limitations
There are also some shortcomings in this paper. Firstly, the experimental dataset is limited, and the model effect needs to be verified on a larger dataset in the future. Secondly, more random elements might be taken into consideration, and the model still needs to generate a more diverse set of images. Ultimately, the model requires a lengthy training period, thus it is necessary to maximize calculation efficiency. The future research can optimize the model calculation process and improve the training efficiency by using parallel computing and other means. Generally speaking, the research on IG based on LSTM is still in the primary stage, and it can be improved from many aspects such as calculation efficiency and generation quality in the future to produce better application results.
Statements and declarations
Footnotes
Conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by 2022s Zhejiang Province provincial first-class courses and the first batch of first-class undergraduate courses in labor education and Office of Zhejiang Provincial Department of Education Teaching projects research result (2022sylkc14). This work was also supported by the Ministry of Education University-Industry Collaborative Education Program (Grant No.230801444284211).
