Automatic image generation based on deep learning long short-term memory network

Abstract

As deep learning technologies develop rapidly, the increase of image quality and variety has become the focus of computer vision research. The paper proposes an image production approach based on deep learning (DL) long short-term memory (LSTM) network to extract richer semantic data by leveraging the long-term memory capacity of LSTM network to enhance image generating effect. The generation model of adding LSTM as encoder is studied and designed. The experiment is conducted on a computer equipped with a high-speed Graphics Processing Unit (GPU) and a large memory, using a dataset containing multi-class images. Taking the random forest network model as the control group, this paper evaluates the peak signal-to-noise ratio (PSNR), structural similarity, and generation accuracy. The findings demonstrate that the proposed model’s PSNR and structural similarity are superior to those of the control group. The PSNR reaches 20.5 dB, and the structural similarity reaches 0.8 after 100 iterations. After 600 cycles, the image generation accuracy reached 99%, which is also higher than that of the control group. This demonstrates how well the LSTM-based DL model can extract an image’s long-term semantic information and produce images that are both higher quality and more accurate. This paper is an invaluable resource for refining the DL-based image creation technique.

Keywords

deep learning long short-term memory automatic image generation random forest model peak signal-to-noise ratio

Introduction

Research background and motivations

One of the most significant areas of study in the science of computer vision has always been image formation.^1,2 A major advancement in the task of picture production has been realized by the DL model, thanks to the quick development of DL technology. The LSTM network is a DL model that has proven to be incredibly effective in time series analysis and natural language processing, capable of handling sequence data. Based on these backgrounds, this paper aims to explore automatic image generation (IG) methods based on DL LSTM networks to benefit the quality and diversity.^3–5

Generative Adversarial Networks (GANs) have achieved great success in IG. However, the traditional GANs model still has some challenges in generating high-quality images.^6–8 One of them is the balance between the authenticity and diversity of generated images. In order to generate realistic images, models tend to generate samples similar to training data, resulting in insufficient diversity of generated images. As a neural network that can remember long-term dependent information, LSTM model has certain advantages of generating diversity, which is expected to improve this problem in IG.^9,10

Furthermore, the majority of currently used picture production techniques are predicated on the encoding-decoding structure, wherein the input image is mapped by the encoder into a vector representation in the potential space, and this vector representation is then mapped back to the original image space by the decoder.^11,12 Nonetheless, many complicated image contents and architectures are frequently restricted by the classic encoder-decoder structure.^13–15 The paper addresses this issue by introducing the LSTM model as an encoder component. By extracting richer semantic characteristics through the learning of long-term dependencies within the image, the capacity to generate images is improved.

Research objectives

In light of the above context and inspiration, the purpose of this paper is to present a DL LSTM network-based image production technique and achieve varied and high-quality IG by merging LSTM’s memory capacity with GAN’s benefits. Based on LSTM, the paper implements a novel image generating model. In many cases, the encoder in the conventional encoding-decoding structure is unable to accurately capture the image’s long-term reliance. In order to extract deeper semantic characteristics from the image, this paper incorporates the LSTM model into the encoder. The research findings have significant theoretical and practical significance in the field of computer vision and strongly promote the development of a more potent picture creation model.

Literature review

The theme of this paper is automatic IG based on DL LSTM network. In related research fields, LSTM network, as an important DL model, has shown excellent performance in many application fields. Kwak et al. (2020)¹⁶ studied the potential of crop classification using bidirectional LSTM networks. The research results showed that this method could effectively classify crops. Nabati et al. (2020)¹⁷ studied the generation of video subtitles using enhanced and parallel LSTM networks. The research method improved the performance of video subtitle generation by introducing enhancement and parallel mechanism into the model. Hao et al. (2020)¹⁸ studied the fault diagnosis method of multi-sensor bearing under one-dimensional convolution LSTM network. The research results suggested that this method could effectively diagnose bearing faults. Haputhanthri et al. (2021)¹⁹ studied the prediction of solar irradiance in virtual power plants using multi-modal LSTM networks. The research method improved the accuracy of solar irradiance prediction by using various sensor data. Zhang et al. (2021)²⁰ studied the weather radar echo prediction method based on convolutional neural network (CNN) and LSTM network for sustainable electronic agriculture. In this paper, CNN and LSTM network models are used to predict weather radar echo. The research results showed that this method effectively predicted the weather radar echo and provided support for sustainable e-agriculture. Lindemann et al. (2021)²¹ summarized the application of LSTM networks in time series prediction. The characteristics and application of LSTM network model in time series prediction were summarized. The research results showed that LSTM networks had a good effect in time series prediction.

The above research shows the importance and wide application of LSTM network in the field of DL. By using the memory and long-term dependence of LSTM, researchers have made remarkable progress in image automatic generation, classification, subtitle generation, and fault diagnosis.^22–25 It is reasonable to assume that LSTM and its associated applications will open up additional possibilities and opportunities given the ongoing advancements and breakthroughs in DL technology.^26–28

Research model

Comprehensive training and analysis of IG model

Large-scale real picture datasets are gathered, and rigorous screening and pretreatment are performed to guarantee the training data and remove noise and needless variation.^29–31 This is the dataset preparation and administration component. The LSTM-based picture generating model and data coding are crucial components of the design. By introducing the LSTM model into the encoder, people can make full use of its long-term memory ability, capture the long-term dependence in the image, and extract richer semantic features. The quality and variety of the created images are improved by this architecture, which also benefits the image generating model in the content and structure of complicated images. The method of GANs is used during the model’s training procedure. The model is motivated to learn and provide authentic and varied images as a result of the confrontation training between generator and discriminator. To achieve the optimal generating effect, the training of the generator and discriminator is guided by an appropriate loss function and optimization method.^32–35

Design of data coding and IG model based on LSTM

This paper focuses on the design of data coding and IG model based on DL LSTM. By introducing LSTM as a part of the encoder, the purpose is to improve the understanding ability of the IG model for complex image content and structure to realize high-quality and diverse IG.^36–39 The core of the research design is to embed the LSTM model into the encoder to make full use of its long-term memory ability. The generator and discriminator compete with one another during model training. This gradually boosts the generator’s ability to produce real images and also improves the discriminator’s ability to distinguish between real and created images.^40,41 This competitive and cooperative technique encourages model learning to produce authentic and diverse visuals. The structure of the LSTM-based data coding and picture creation model is sorted out, and Figure 1 depicts the model’s structure.

Figure 1.

Structure frame diagram of data coding and IG model based on LSTM.

Experimental design and performance evaluation

Experimental materials

In the experiment, an extensive image dataset is used as the experimental material to ensure the reliability and generalization ability. This dataset contains different categories and themes, covering natural scenery, portraits, animals, objects, and other fields. Choose this diverse dataset to test the IG ability of the model in different scenes and contents. The image is pre-processed to guarantee the quality and uniformity of datasets.⁴² In order to assist the training, fine-tuning, and assessment of the model, the training set, verification set, and test set are separated from the data collection. The model’s parameter learning and training are done on the training set, while its optimization and hyperparameter selection are done on the verification set, and its performance is assessed and contrasted on the test set.

Experimental environment

In the experiment, a well-equipped computer is used as the experimental instrument. Specifically, the experimental computer is a high-performance desktop computer, which is equipped with an advanced GPU with large memory and cache to provide powerful computing power and data processing speed. The experimental computer’s high-performance processor and significant memory can handle the demands of DL model training and speed up the generation and processing of image data.⁴³ Table 1 displays the precise settings of the particular experimental computer.

Table 1.

Specific settings of experimental environment.

Item	Parameter
Processor	Intel Core i7-9700K
Internal storage	32 GB DDR4 RAM
Display card	NVIDIA GeForce RTX 2080 Ti
Memory	1 TB SSD
Operating system	Windows 10

Parameter setting

This section introduces the parameter settings used in the experiment. The selection and setting of parameters have an important influence on the training and IG of DL LSTM networks. Reasonable parameter selection can improve the model performance and the quality of generated results.⁴⁴ Table 2 shows the settings of the model used in the experiment.

Table 2.

Specific experimental parameters of the LSTM network model used in the experiment.

Parameter	LSTM layer number	Implicit state dimension	Learning rate	Iterations	Batch size	Number of samples generated	Optimizer	Training set proportion	Verification set proportion	Test set proportion
Value	2	256	0.001	10,000	64	100	Adam	0.8	0.1	0.1

Choosing and setting these parameters reasonably can train effectively and generate high-quality images. The selection of parameters needs to be adjusted according to the characteristics of specific tasks and datasets to achieve the best experimental results.

Performance evaluation

The performance of DLSTM model is compared with that of traditional automatic IG model based on random forest network (RFAGM) to study proposed Automatic IG model based on DLSTM. From PSNR, Structural Similarity Index (SSIM), IG Accuracy, and IG Speed, performance evaluation is made in terms of authenticity index of IG and Overlap index of IG. Figure 2 shows the data change trend of different types of IG models in terms of PSNR, while Figure 3 shows the data change trend of these models in terms of SSIM. These charts can be used to compare and evaluate the IG ability of different models.

Figure 2.

Variation trend of PSNR data of different types of IG models.

Figure 3.

Changing trend of SSIM data of different types of IG models.

Figure 2 demonstrates that the PSNR of the two models grows as the number of iterations increases, demonstrating an ever-improving level of image creation quality. But the PSNR of the DLSTM model is consistently higher than the PSNR of the RFAGM model, demonstrating that the DLSTM model can provide images with more accuracy than the RFAGM model. In every iteration, the DLSTM model is superior to the RFAGM model, and both models get better at creating images as the number of iterations rises.

Figure 3 shows that the SSIM of the DLSTM model is higher than that of the RFAGM model in all iterations, which shows that the DLSTM model can keep the structural information of the image better than the RFAGM model. Specifically, when the number of iterations is 600, the SSIM of the DLSTM model reaches the maximum value of 1.2, and that of the RFAGM model also reaches the maximum value of 0.8. Therefore, the figure shows the structural similarity comparison between DLSTM and RFAGM under different iterations. In addition, Figure 4 shows the variation curves of IG accuracy data of different types of IG models, while Figure 5 shows the variation curves of IG speed data of these models. These charts can be used to compare and evaluate the accuracy and speed performance of different models in IG.

Figure 4.

Variation curves of IG accuracy data of different types of IG models.

Figure 5.

Variation curve of IG speed data of different types of IG models.

Figure 4 shows that the generation accuracy of DLSTM model is higher than that of the RFAGM model in all iterations, which shows that the DLSTM model can generate images more effectively than the RFAGM model. Specifically, when the number of iterations is 600, the generation accuracy of the DLSTM model reaches the highest value, and the generation accuracy of the RFAGM model also reaches the highest value. Therefore, by comparing the generation accuracy of DLSTM and RFAGM in different iterations, it shows that the DLSTM model is significantly better than the RFAGM model in all iterations.

Figure 5 shows the comparison of the generation speed of DLSTM and RFAGM under different iterations. The DLSTM model is slower than the RFAGM model in all iterations. As the number of iterations increases, both models stabilize. However, the two models are different in the quality of generation. It can be seen that both DLSTM and RFAGM are IG models based on DL and Recurrent Neural Network (RNN). They all use the memory ability of RNN and the structure of LSTM to capture the time sequence information and structural information in images. In addition, Figure 6 shows the data change trend of different types of IG models in terms of the truth index of IG, while Figure 7 shows the data change trend of these models in terms of the overlap index of IG.

Figure 6.

Change trend of reality index data of IG based on different types of IG models.

Figure 7.

Change trend of overlap index data of IG in different types of IG models.

In Figure 6, the authenticity indices of both DLSTM and RFAGM models fluctuate with the increase of iteration times, but the fluctuation range of the DLSTM model is larger and that of RFAGM model is smaller. This shows that the DLSTM model is more susceptible to the number of iterations than the RFAGM model, and the RFAGM model is more stable than the DLSTM model. The authenticity index of DLSTM and RFAGM models shows no obvious growth or decline trend, but a periodic fluctuation. This may mean that both models have certain limitations and uncertainties when generating images.

In Figure 7, the overlap index of DLSTM and RFAGM models increases with the increase of iteration times, but the rising speed of the RFAGM model is faster, and it is higher than that of the DLSTM model in all iterations. When the number of iterations is 200, the overlap index of the DLSTM model is 0.404, while that of the RFAGM model is 0.494, with a difference of 0.09. This gap may mean that the DLSTM model still has some noise or distortion when generating images, resulting in a small intersection area between the generated images and the original images. However, RFAGM model may be clearer and smoother when generating images, resulting in a larger intersection area. Therefore, the RFAGM model is superior to the DLSTM model in all iterations, and both models improve the similarity between the generated image and the original image with the increase of iterations.

Discussion

Comparing this research with previous research, Singh et al. (2021)⁴⁵ reviewed the method of medical IG using GAN. The method and application of medical IG using GAN are summarized. The research results showed that the GAN had potential application value in medical IG. Chlap et al. (2021)⁴⁶ reviewed the data enhancement technology of DL applied to medical images. The research summarized various technologies and methods of DL in medical image data enhancement. The research results showed that these data enhancement techniques can improve the performance of DL in medical image analysis. Loverdos et al. (2022)⁴⁷ investigated automatic image segmentation and crack detection for brick walls using machine learning. The research results showed that this method can effectively detect cracks in brick walls, which provided convenience for maintenance and repair. Tsuneki (2022)⁴⁸ used DL algorithm to analyze and classify gesture images, and realized the function of gesture recognition. The research results showed that the DL method has good performance in gesture recognition and can be widely used in intelligent interaction, virtual reality, and other fields. Chun et al. (2022)⁴⁹ studied the application of natural language processing technology based on DL in text sentiment analysis. The research results showed that the natural language processing technology based on DL had a good effect in text emotion analysis. Barrera et al. (2023)⁵⁰ reviewed the application of DL in image semantic segmentation. The research results showed that DL has achieved remarkable results in image semantic segmentation and has been widely used in many application fields.

The above research and previous related work show that DL has shown remarkable research progress and broad application potential in many fields such as medical IG, data enhancement, image segmentation, gesture recognition, text sentiment analysis, and image semantic segmentation. The research results show that these methods are of great reference value for improving the accuracy of automatic IG.

Conclusion

Research contribution

This paper’s primary contribution is the innovative LSTM-based image creation technique it suggests. The innovation of the research lies in introducing the LSTM model into the encoder to make full use of its long-term memory ability and extract the semantic information of the image. The experimental results demonstrate that the model in the paper can produce images with higher quality and more accuracy when compared to the conventional random forest network model. The research shows that the PSNR and structural similarity of the model based on LSTM are better than those of the traditional model, and the accuracy is also significantly improved. This offers a valuable point of reference for enhancing the DL-based image generating algorithm going forward. This report is novel in that it establishes the impact of LSTM on IG tasks, hence providing a solid basis for future LSTM-based IG studies.

Future works and research limitations

There are also some shortcomings in this paper. Firstly, the experimental dataset is limited, and the model effect needs to be verified on a larger dataset in the future. Secondly, more random elements might be taken into consideration, and the model still needs to generate a more diverse set of images. Ultimately, the model requires a lengthy training period, thus it is necessary to maximize calculation efficiency. The future research can optimize the model calculation process and improve the training efficiency by using parallel computing and other means. Generally speaking, the research on IG based on LSTM is still in the primary stage, and it can be improved from many aspects such as calculation efficiency and generation quality in the future to produce better application results.

Statements and declarations

Footnotes

Conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by 2022s Zhejiang Province provincial first-class courses and the first batch of first-class undergraduate courses in labor education and Office of Zhejiang Provincial Department of Education Teaching projects research result (2022sylkc14). This work was also supported by the Ministry of Education University-Industry Collaborative Education Program (Grant No.230801444284211).

References

Guo

Liu

, et al. Attention mechanisms in computer vision: A survey. Comput Vis Media 2022; 8(3): 331–368.

Yin

. Computer vision and machine learning applied in the mushroom industry: A critical review. Comput Electron Agric 2022; 198: 107015.

Muhammad

Kusumaningrum

Wibowo

. Sentiment analysis using Word2vec and long short-term memory (LSTM) for Indonesian hotel reviews. Proc Comput Sci 2021; 179: 728–735.

Liu

Wan

, et al. A long short-term memory-based model for greenhouse climate prediction. Int J Intell Syst 2022; 37(1): 135–151.

Van

Mosquera

Nápoles

. A review on the long short-term memory model. Artif Intell Rev 2020; 53: 5929–5955.

Yang

Singh

Tavakkoli

, et al. CNN-LSTM deep learning architecture for computer vision-based modal frequency detection. Mech Syst Signal Process 2020; 144: 106885.

Ding

Gao

, et al. Real-time anomaly detection based on long short-Term memory and Gaussian Mixture Model. Comput Electr Eng 2019; 79: 106458.

Chen

Zhu

Steibel

, et al. Recognition of aggressive episodes of pigs based on convolutional neural network and long short-term memory. Comput Electron Agric 2020; 169: 105166.

Wang

She

Ward

. Generative adversarial networks in computer vision: A survey and taxonomy. ACM Comput Surv 2021; 54(2): 1–38.

10.

Voxid

Xolbek

Kamoliddin

. Sorting the object based on neural networks computer vision algorithm of the system and software. Ijtimoiy Fanlarda Innovasiya Onlayn Ilmiy Jurnali 2023; 3(1): 67–69.

11.

Park

Huh

, et al. Review on generative adversarial networks: focusing on computer vision and its applications. Electronics 2021; 10(10): 1216.

12.

Robson

Denholm

Coffey

. Automated processing and phenotype extraction of ovine medical images using a combined generative adversarial network and computer vision pipeline. Sensors 2021; 21(21): 7268.

13.

Sampath

Maurtua

Aguilar Martin

, et al. A survey on generative adversarial networks for imbalance problems in computer vision tasks. J Big Data 2021; 8: 1–59.

14.

Chai

Zeng

, et al. Deep learning in computer vision: A critical review of emerging techniques and application scenarios. Machine Learning with Applications 2021; 6: 100134.

15.

Islam

MMM

Kim

. Vision-based autonomous crack detection of concrete structures using a fully convolutional encoder–decoder network. Sensors 2019; 19(19): 4251.

16.

Kwak

Park

Ahn

, et al. Potential of bidirectional long short-term memory networks for crop classification with multitemporal remote sensing images. Korean Journal of Remote Sensing 2020; 36(4): 515–525.

17.

Nabati

Behrad

. Video captioning using boosted and parallel Long Short-Term Memory networks. Comput Vis Image Understand 2020; 190: 102840.

18.

Hao

, et al. Multisensor bearing fault diagnosis based on one-dimensional convolutional long short-term memory networks. Measurement 2020; 159: 107802.

19.

Haputhanthri

De Silva

Sierla

, et al. Solar irradiance nowcasting for virtual power plants using multimodal long short-term memory networks. Front Energy Res 2021; 9: 722212.

20.

Zhang

Huang

Liu

, et al. Weather radar echo prediction method based on convolution neural network and long short-term memory networks for sustainable e-agriculture. J Clean Prod 2021; 298: 126776.

21.

Lindemann

Müller

Vietz

, et al. A survey on long short-term memory networks for time series prediction. Procedia CIRP 2021; 99: 650–655.

22.

Hua

Zhao

, et al. Deep learning with long short-term memory for time series prediction. IEEE Commun Mag 2019; 57(6): 114–119.

23.

Punia

Nikolopoulos

Singh

, et al. Deep learning with long short-term memory networks and random forests for demand forecasting in multi-channel retail. Int J Prod Res 2020; 58(16): 4964–4979.

24.

Zhou

Luo

Feng

, et al. Long-short-term-memory-based crop classification using high-resolution optical images and multi-temporal SAR data. GIScience Remote Sens 2019; 56(8): 1170–1191.

25.

Lyu

Huang

, et al. Multiscale echo self-attention memory network for multivariate time series classification. Neurocomputing 2023; 520: 60–72.

26.

Sun

Fang

. Using long short-term memory recurrent neural network in land cover classification on Landsat and Cropland data layer time series. Int J Rem Sens 2019; 40(2): 593–614.

27.

Rachman

Mubarok

Dewi

ENF

, et al. Implementation of convolutional neural network and long short-term memory algorithms in human activity recognition based on visual processing video. JOIV: Int J Inform Visualization 2023; 7(2): 494–501.

28.

Burduja

Ionescu

Verga

. Accurate and efficient intracranial hemorrhage detection and subtype classification in 3D CT scans with convolutional and long short-term memory neural networks. Sensors 2020; 20(19): 5611.

29.

Topal

Chitic

Leprévost

. One evolutionary algorithm deceives humans and ten convolutional neural networks trained on ImageNet at image recognition. Appl Soft Comput 2023; 143: 110397.

30.

Zhang

Wang

, et al. Deep learning based online metallic surface defect detection method for wire and arc additive manufacturing. Robot Comput Integrated Manuf 2023; 80: 102470.

31.

Zhang

Wang

Jiang

, et al. Diversifying tire-defect image generation based on generative adversarial network. IEEE Trans Instrum Meas 2022; 71: 1–12.

32.

Saharia

Chan

, et al. Cascaded diffusion models for high fidelity image generation. J Mach Learn Res 2022; 23(1): 2249–2281.

33.

Alrashedy

HHN

Almansour

Ibrahim

, et al. BrainGAN: brain MRI image generation and classification framework using GAN architectures and CNN models. Sensors 2022; 22(11): 4297.

34.

Ghassemi

Shoeibi

Rouhani

. Deep neural network with generative adversarial networks pre-training for brain tumor classification based on MR images. Biomed Signal Process Control 2020; 57: 101678.

35.

Zhang

. 3D model generation on architectural plan and section training through machine learning. Technologies 2019; 7(4): 82.

36.

Fang

Zhang

Ding

, et al. A new sequential image prediction method based on LSTM and DCGAN. Comput Mater Continua (CMC) 2020; 64(1): 217–231.

37.

Ding

Zhang

Jia

, et al. Where to prune: Using LSTM to guide data-dependent soft pruning. IEEE Trans Image Process 2020; 30: 293–304.

38.

Xiao

Xue

Shen

, et al. A new attention-based LSTM for image captioning. Neural Process Lett 2022; 54(4): 3157–3171.

39.

Luo

. PFST-LSTM: A spatiotemporal LSTM model with pseudoflow prediction for precipitation nowcasting. IEEE J Sel Top Appl Earth Obs Rem Sens 2020; 14: 843–857.

40.

Teng

Duan

Liu

, et al. Global to local: Clip-LSTM-based object detection from remote sensing images. IEEE Trans Geosci Rem Sens 2021; 60: 1–13.

41.

Cao

Zhou

QMJ

, et al. Coverless information hiding based on the generation of anime characters. J Image Video Proc 2020; 2020: 1–15.

42.

Chen

Zhang

Geng

, et al. Strong spatiotemporal radar echo nowcasting combining 3DCNN and bi-directional convolutional LSTM. Atmosphere 2020; 11(6): 569.

43.

Zhang

. Dual attention on pyramid feature maps for image captioning. IEEE Trans Multimed 2021; 24: 1775–1786.

44.

Chen

Liu

Xie

, et al. Sofgan: A portrait image generator with dynamic styling. ACM Trans Graph 2022; 41(1): 1–26.

45.

Singh

Raza

. Medical image generation using generative adversarial networks: A review. Health Inform: A Comp Perspect Healthcare 2021; 77–96.

46.

Chlap

Min

Vandenberg

, et al. A review of medical image data augmentation techniques for deep learning applications. J Med Imaging Radiat Oncol 2021; 65(5): 545–563.

47.

Loverdos

Sarhosis

. Automatic image-based brick segmentation and crack detection of masonry walls using machine learning. Autom ConStruct 2022; 140: 104389.

48.

Tsuneki

. Deep learning models in medical image analysis. J Oral Biosci 2022; 64(3): 312–320.

49.

Chun

Yamane

Maemura

. A deep learning-based image captioning method to automatically generate comprehensive explanations of bridge damage. Computer aided Civil Eng 2022; 37(11): 1387–1401.

50.

Barrera

Merino

Molina

, et al. Automatic generation of artificial images of leukocytes and leukemic cells using generative adversarial networks (syntheticcellgan). Comput Methods Progr Biomed 2023; 229: 107314.