Abstract
Abstract Art, a highly popular artistic genre, often serves as a canvas for expressing the artist's emotions. Numerous researchers have endeavoured to analyse abstract art through the application of machine and deep learning techniques, focusing on tasks such as edge detection, brushstroke analysis, and emotion recognition. This research paper presents an investigation of a wide distribution of abstract paintings using Generative Adversarial Neural-Networks(GANs). GANs have the ability to learn and reproduce a distribution enabling researchers and scientists to effectively explore and study the generated image space. However, the challenge lies in developing an efficient GAN architecture that overcomes common training pitfalls. This paper addresses this challenge by introducing a modified-DCGAN(mDCGAN) specifically designed for high-quality artwork generation. The proposed mDCGAN incorporates meticulous adjustments in layer configurations, offering tailored optimisation techniques and loss functions to effectively combat issues like mode collapse and gradient vanishing in order to improve stability and realism in art generation. The evaluation results of mDCGAN demonstrates a remarkable reduction in mode collapse occurrences when compared to the standard DCGAN configuration. Further this paper explores the generated latent space by performing random walks to understand vector relationships between brush strokes and colours in the abstract art space and a statistical analysis of unstable outputs after a certain period of GAN training and compare its significant difference. These findings validate the effectiveness of the proposed approach, emphasising its potential to revolutionise the field of digital art generation and digital art ecosystem.
Keywords
Introduction
Abstract art generation using deep learning provides a unique platform for exploring novel visual aesthetics and enables automated generation of diverse, non-representational visual content for various applications like design, multimedia, and digital media. Pattern recognition in art generation is essential as it enables the identification and incorporation of artistic elements, such as brush strokes, colour patterns, and textures, which are crucial for creating authentic and visually appealing artwork. Pattern recognition, a machine learning algorithm, considers useful features while eliminating redundant ones.1,2 It allows the generation of art that reflects the essence of different artistic styles and traditions, contributing to the richness and diversity of artistic output. Pattern recognition is based on the generalisation of objects to classify real-world items by identifying certain shared common features. This technique is invaluable in various fields such as text pattern recognition,3,4 fingerprint scanning, 5 data compression, 6 forecasting, 7 seismic activity analysis, 8 audio recognition 9 and healthcare. 10
In our study, pattern recognition was implemented through Generative Modelling 11 using Generative Adversarial Networks (GANs). 12 Generative Models have the ability to learn and explain the distribution large amount of data in various forms such as audio, images, words, etc. Once the model is trained, it can generate new data by extracting samples from the derived data or from random noise. These models have to ability to learn and model underlying categories, dimensions, and other aspects without specific programming. There are different generative models tailored for specific requirements, including GANs for style transfer in images, 13 Hidden Markov Models (HMMs) for speech recognition, 14 Variational Auto-encoders (VAEs) for image generation, 15 and Auto-encoders for anomaly detection. 16
Generative Adversarial Networks (GANs) consist of a generator and a discriminator. The generator produces images after learning the distribution while the discriminator, a Deep Neural Network, 17 determines whether the generated image is real or fake. GANs are chosen for their ability to produce high-quality outputs from grainy inputs and for their flexibility and fine-tuning capabilities, crucial aspects of any model.
This study explores and derives the mathematical features of colour patterns through latent space exploration by random walks. 18 A 1D random walk involves an object moving left, right, or staying in place, with probabilities determining its future movements. In a 2D random walk, the movement extends to the XY, YZ, or XZ axis—up, down, left, or right—with equal probabilities, akin to the movement of chess pieces on a board. In a 3D random walk, spatial positions are considered, predicting the object's probability in a Monte Carlo 19 randomised algorithm.
Random walks are implemented in this study to support style transfer and fusion, where a random walk in a latent space mix various style smoothly. This approach starts from a known point in the latent space 20 and explores random directions, generating unique art styles while mapping artistic parameters provided by the user, such as textures and brush stroke styles. These mapped parameters serve as inputs for the model, guiding it through random directions in the latent space. This method also generates a sequence of latent vectors, enabling a gradual transformation and evolution of the generated images. The transitions between styles in the outputs are gradual and exploratory providing state of the art results enabling us to successfully study abstract art patterns. Further this paper includes the study of distorted patterns after a certain point in training and describes the statistical analysis and tests performed in order to compare the distorted colour space with the original space.
Therefore, the proposed work addresses the following research questions:
How can an efficient GAN architecture be developed to overcome common training pitfalls and improve stability and realism in abstract art generation? How can mode collapse occurrences in GAN-based art generation be reduced, broadening the range of generated artworks, and enhancing diversity and stability? What insights can be gained from exploring the latent space of generated abstract art through random walks, and how can this exploration contribute to the understanding of abstract art patterns?
The proposed study efficiently addresses the above research questions and provides the following contributions:
It addresses the challenge of developing an efficient GAN architecture for high-quality artwork generation by proposing a modified-DCGAN (mDCGAN), specifically designed to the unique demands of art generation while improving stability and realism. The proposed mDCGAN incorporates meticulous adjustments in layer configurations and architectural choices, effectively combating issues like mode collapse and gradient vanishing. The paper explores the generated latent space through random walks, allowing for an understanding of vector relationships between brush strokes and colours in abstract art, contributing to the study of abstract art patterns. The evaluation results show a remarkable reduction in mode collapse occurrences compared to the standard DCGAN configuration, indicating enhanced diversity and stability in the generated artworks. It includes a statistical analysis of unstable outputs after a certain period of GAN training, providing insights into the effectiveness of the proposed approach in revolutionising digital art generation and the digital art ecosystem.
The following sections of the remaining paper are divided as follows: Section 2 describes the related work with respect to pattern recognition and Generative Adversarial Networks(GAN). Section 3 describes the proposed workflow and architecture of the proposed study. Section 4 discusses the results of the brush stroke colour patterns qualitatively, latent space exploration by random walks and a statistical analysis of distorted outputs at a certain period of training. Section 5 concludes the study and describes the future scope of this study.
Related works
This section discusses the related literature with respect to pattern recognition using Machine learning techniques, and the use of GANs of in several pattern recognition, generative and style transfer tasks, 21 . 22 studied the spatial and temporal variations of water quality of the Suquia River Basin. The researchers used factor analysis, cluster analysis, principal component analysis and Discriminant Analysis to cluster and obtain patterns. They concluded that Cluster Analysis gives good results as an initial exploratory method to evaluate spatial and temporal differences and reduces the number of parameters by 77% to differentiate between four spatial areas and Discriminant Analysis provides the best results for temporal and spatial analysis. It performs 73% reduction in parameters to differentiate between wet and dry season upto a precision of 87%. 23 studied different pattern recognition techniques for apple sorting. They made use of K-Nearest Neighbours algorithm using 1 and 2 nearest neighbours, decision trees and Artificial neural networks. They noted that with respect to classification performance, 1 and 2 nearest neighbours methods using five input features yielded the second best results while the Neural Network was able to detect non-linear relationships in apple sorting patterns.
In the field of pattern recognition and classification with respect to art works, 24 studied various pattern extraction techniques using Generative Adversarial Neural Networks and Deep Convolutional Neural Networks to classify art periods, emotions from art works and construct a social network of artists. 25 used a Radial Basis Function neural network classifier to model western paintings for classification. These groups of networks are very powerful and have been used in function approximation, pattern classification and data compression. For the feature extraction process, the researchers made use of Gabor wavelets, a popular wavelet transform in image processing, 26 . 27 performed a 3-step hierarchal classification of paintings using face and brush stroke models. Their 3-step approach is inclusive of colour classification by grouping portrait miniatures by computing the mean RGB value, followed by shape classification on a region-by-region basis by reducing the search space to a specific Region of Interest followed by stroke detection and classification. 28 studied fast texture synthesis using tree-structured vector quantisation. They implemented a Gaussian Pyramid and Markov Random Field like architecture and used tree-structured vector quantisation for acceleration, a common method for data compression. The approach has the ability to replicate an image on given texture as input. 29 studied style and abstraction in portrait sketching. The researchers replicate the sketch stroke of artists by performing edge detection using the canny edge detection operator in addition to stroke matching and curve detection. 30 analyses various algorithms and methods for stroke-based rendering. The optimisation method includes Voronoi Algorithms that use the property of SBR to perform efficient global update steps, trial and error algorithms performs heuristically chosen tests to reduce randomness. The researchers studied Greedy algorithms but it was concluded that they are too slow for any interactive application. 31 proposed features derived from colours, edges and grey scale-texture of images that discriminate paintings from photographs. They proposed a neural-network classification methodology with 6 sigmoidal units in a unique hidden layer to perform painting-photograph discrimination.
Generative Adversarial Neural networks, since its introduction has become popular in the field of art being used for generation and style transfer purposes. 32 compared various popular GAN architectures. They concluded that Pix2Pix could be relevant for contemporary simple-styled style transfer tasks for Ortho-images but not suitable for old map styles which are more different and visually complex in content and styles, while Cycle-GAN could be more revenant for such images. 33 uses a 5-layer CNN to perform style transfer on images. They noted that the speed of image synthesis is hindered by the image and resolution of the fearer space, in addition to this they mentioned that denoising images is a challenge with this architecture. 34 implemented UnityGAN to learn the style changes between camera, producing shape-stable style unity images for each camera. They made use of skip-connections between multi-depth layers which enabled the retention of more structural information therefore accoutring for the stability of the generated image. 35 proposed APDrawingGAN++ to transform the photo of a face to a high quality APDrawing. It made use of auto encoders to improve facial feature drawings, lip and hair classifiers were introduced to guide the local generator and auto encoder to a desired style. Moreover the researchers made use of DT loss to penalise large misalignments. 36 compared and analysed various kinds of Neural networks for art-based applications such as GANS, Image stylisation, DeepDream and Perception Engines which includes image Fourier models. 37 implements BigGAN- deep model on the ImageNET dataset with hierarchical latent spaces. BigGAN deep differs from the BigGAN model as it contains 2 extra 1X1 convolutions to provide the required number of output filters for the images. On increasing the depth by two, the researcher's noted performances were negatively affected to an extent. 38 used mGANprior that employs multiple latent codes for reconstructing real images with a pre-trained-GAN model. It enables the use of GAN's as a powerful prior for pre-processing tasks such as colorisation, in painting, inverting images. 39 implements DCGAN and finds closest latent features in order to update the latent vector gradually and smoothly to generate the desired image. The architecture was used to make desired edits to images based on users’ requirements. 40 proposed an InterfaceGAN to interpret the semantics encoded in the latent space of GANs. Provides a rigorous analysis of the semantic attributes emerging in the latent space of well-trained GAN models, and then constructs a manipulation pipeline of for leveraging the semantics in the latent code for facial attribute editing. The architecture is tested against encoder-decoder generative models and StyleGANs. 41 implements a progressive GAN growth experiment for improved quality in GAN outputs. The researchers start from a low resolution, and progressively add new layers. The researchers tested the performance of the model on CIFAR10 with an inception score of 8.80.
Proposed work
The proposed work modifies the generator and discriminator of a DC-GAN for stable and enhanced art images generation. The overall work flow of the proposed work is as shown in Figure 1. Random noise z is fed as input to the generator. Labelled data are used to train the discriminator identify the real samples from the fake ones.

Workflow of the proposed work.
The images used for the study have different sizes such as 1024X2048, 512X512, 2048X2048 etc. They are resized to a standard size of 256X256, the average dimensions of the images. In the resized images, standard noise filtering techniques such as Gaussian and Median filters are applied in order to filter out additive gaussian noise. The standard deviation used for the gaussian filter was 0.001, to ensure the process does not lose features through blurring. Further, normalisation is performed by calculating a z-score for all 3 channels. Equation (1) describes the expression used for normalisation
This section discusses about the proposed architecture of mDCGAN for art image generation. A GAN consists of two deep neural network models, a generator and discriminator. 12 The generator tries to overcome the discriminator by trying to make the generator predict all its outputs as real whereas the discriminator tries to distinguish between real and fake images, setting up an adversarial scenario as per game theory. 42 The proposed work involves modifications to both the generator and discriminator components of a DC-GAN architecture, aimed at achieving stable and enhanced art image generation.
A DC-GAN comprises two deep neural network models engaged in an adversarial game—the generator strives to produce outputs that the discriminator predicts as real, while the discriminator distinguishes between real and fake images. 12 The proposed mDCGAN architecture is based on DCGAN, but with adaptations and architectural changes to accommodate 256 × 256 image dimensions. 43 The modified generator and discriminator layers of the mDCGAN architecture are as shown in Figure 2.

Modified architecture of (a) Generator and (b) Discriminator.
These architectural modifications of mDCGAN aim to enhance the stability and diversity of art image generation, accommodating the unique demands of generating art while mitigating common challenges associated with GANs such as modal collapse, and noisy outputs. A 4X4 kernel for the final convolution layer, a final reshape layer and, additional dropouts are added at the discriminator to ensure the discriminator does not overfit the generator. As suggested in DC-GAN, mDCGAN uses leaky ReLU activated functions with negative slope of 0.2 for the discriminator. The final layer activation of the generator was mapped using tanh(z) activation function, where z represents the output of a convolution mapping. In mDCGAN, square kernels of length 4 is used for every layer of the generator, with padding set to 1. A convolution layer in the generator consists of (i) Transpose convolution layers of stride 2 and padding 1, (ii) A Batch normalisation layer (iii) ReLU 44 activation function for all layers except the last layer which uses tanh. The number of kernels decrease by a factor of 2 for every layer to construct an image with 3 channels. The first 2 layers in the generator are linear and reshape layers respectively, aimed to transform a 100-dimensional vector to a vector with the help of a linear transformation. The 16384-feature vector is reshaped to a block of size 1024X4X4, where 1024 represents the number of kernel filters or channels and 4X4 represents the height and width dimensions of the embedding. After reshaping the embeddings are transpose convolved through 6 layers to construct an image of size 3X256X256 where 3 represents the RGB channels. For the discriminator, mDCGAN uses square kernels of size 3 for every layer with padding set to 1. A convolution layer in the discriminator consists of (i)Convolution mapping of stride 2 and padding 1, (ii) Batch normalisation layer (iii) Leaky ReLU activated functions with negative slope set as 0.2. For every layer, the number of kernels increase by a factor of 2 which enables the model to learn low level features at a smaller spatial dimension. The final convolution layer of the discriminator uses a square kernel of size 4, this plays a role in reducing the dimensions to 1X1X1 with fewer convolution layers, ensuring the discriminator does not overfit the generator due to a deeper network. The final layer of the discriminator is a reshape layer that transforms an image of size 1X1X1 into a one-dimensional vector, so that it can be passed into a probability mapping function such as sigmoid. Table 1 and Table 2 describes the layers of the discriminator and generator respectively.
Discriminator layers.
Generator layers.
During training, labels are assigned for real and fake images. The images from the dataset are labelled as real and the images generated from noise is labelled as fake. The role of the discriminator is to accurately classify all real images as real and fake images as fake. First the output of the discriminator model is computed on an equal distribution of real and fake images. Then the loss is estimated to update the discriminator weights. Then a sample of 100 data points, represented as a 100-dimensional vector is obtained from a uniform random distribution and passed into the generator. The generated samples, output of the generator, are passed into the discriminator which maps them to real labels (real/fake). That is, the generator tries to convince the discriminator that its generated image is real. In this process, if the generator successfully does so, the generator is rewarded, else its penalised. The loss value is computed and the weights for the generator are updated. This sets up an adversarial training loop.
Objective function
The mDCGAN is trained using the Binary cross entropy loss function. The ground true labels are real or fake, and the input is a probability obtained from the sigmoid function. Equation (2) describes the expression used to compute the loss value,

Generator and discriminator loss.
Both the generator and discriminator were trained using the Adam optimiser.
45
Adams optimiser uses Momentum and the exponential moving average of gradients from RMSProp in order to attain convergence quickly. The following parameters of Adam's optimiser are used in the proposed study based on the hyperparameters used by the researchers of DCGAN:
The learning rate parameter was set to 0.002.
Algorithm 1 explains the training of the proposed mDCGAN for stable abstract art generation.
Results and discussion
Experiment setup
The experiments were performed on a 2022 model Macbook Pro, with Apple Silicon M2 chip. The configurations of the system are inclusive of an 8GB unified memory with 256GB Hard disk storage. The system has 8 Core CPU, which has a split-up of 4 performance cores and 4 efficiency cores, and a 10 core GPU. In addition to this, the system has a 16-core neural engine. The models used for the experiment were trained on PyTorch 2.0.1 with the help of helper image transforms package Torchvision version 0.15.2. The Beta version of Torchvision was used for image transformations. PyTorch makes use of the Metal Performance Shaders backend for accelerated GPU training. This extends the PyTorch framework, providing scripts required to run operations on a Mac. In addition, libraries such as Numpy, Matplotlib and Scikit-learn were also used to conduct the experiments During experimentation the train and test metrics on Weights and Biases were continuously recorded which provided insightful plots of train and test curves. In order to prevent the MPS backend from running out of memory and memory leaks, the following steps were put to practice in the experiment process:
Dataset
The dataset chosen for experimentation is the abstract art dataset obtained from Kaggle. 47 It is a diverse dataset comprised of 2782 images of different paintings, each with a distinct colour scheme and pattern, and was scraped at random from various web sources. From initial visualisation most images have used strokes of green, red, black, and orange. For training, randomly sampled 2000 images were used from the dataset while maintaining the overall distribution of the images. The images have varied sizes and resolutions such as 1024X2048, 1024X1024, 512X512 etc., for the purpose of uniformity we resized all images to 1024X1024. All images were smoothened using standard filters such as Gaussian and Median, and no data augmentation was required. 51 discusses the implementation of this dataset with respect to a non-parametric model which helps in texture synthesis and has been used in our model to recognise brushstroke patterns. Since our goal for this research was to enhance, analye and study brushstroke patterns, this dataset serves as a perfect fit for our approach.
Environment
Tuning of hyper-parameters especially for training Generative Adversarial Networks play a crucial role in recognising patterns and generating stable images. GANs are highly susceptible to unstable training, modal collapse, and noise, therefore it is important to fine tune hyper-parameters optimally. 48
Table 3 describes the parameters used to obtain stable training of mDCGAN. Stability is considered when tuning the Adams Beta 1 and Learning Rate parameters. Batch size was set to 32 taking memory constraints of the system environment into account.
Summary of training environment parameters.
Summary of training environment parameters.
The model was trained according to the training parameters mentioned in Section 4.3. The work involved training the two modules of mDCGAN together: the generator and the discriminator. Figure 3 describes the training curves of the generator and discriminator. Orange represents generator loss and blue represents discriminator loss. From Figure 3, it can be seen that a close to ideal situation of GAN training is achieved in the training of mDCGAN. The generator and discriminator both try to gain the upper hand against each other. The learning function both the generator and discriminator are similar upto epoch 300 after which they diverge. The loss values oscillate about the value of 0.015 to 0.016 for both the discriminator and generator. No upper or lower bounds on loss were observed due to the unstable training conditions of GANs. It is to note that the loss function is not a metric for training but only used for plot visualisation purposes, since training loss does not wholly represent the image distribution error but is used due to its convex nature. After epoch 500 there are more noise in the outputs due to overfitting.
Qualitative analysis
The following section discusses the resulting patterns that were obtained from various abstract painting. It also discusses how a random walk is performed in the latent space to explore new embeddings and understand the mathematical relations between these features.
Analysis of generated brush stroke patterns
The outputs obtained after generating images are as shown in Figure 4(a) to Figure 4(e). As mentioned in Section 3, the input is a random noise vector generated from a Uniform distribution of 100 dimensions. It can be observed that the model generates multiple different patterns using the colours present in the original dataset. All the generated patterns are different, each having its own unique set of features. We observe the dominance of black strokes, which is a prevalent feature in modern day abstract art. 49 Other patterns include brush strokes of lighter and darker shades of blue, and tin shades of green and red. These colours usually dominate in abstract art paintings symbolising various emotions and depictions of artists.

Generated images of DCGAN.
This section describes the experiment of exploring the latent space. The distribution of the patterns lies in the latent space, which is a multi-dimensional abstract space that encodes the information of the outside world. The latent space of the colour patterns generated by mDCGAN is explored by performing algebraic vector operations to discover new patterns and colours.

Output-1 of random walk at latent space.

Output-2 at mid-training of random walk at latent space.

Output-3 of random walk at latent space.
The section that follows outlines the unstable behaviour of GAN seen over time.
Qualitative and numerical analysis
The model is trained upto 500 epochs and began result visualisation after the 150th epoch. It is noted that after epoch 250, the patterns are distorted with highly pixelated outputs resulting in heavy noise and feature loss. This is probably due to the weights of the generator trying to overcorrect resulting in large updates when the discriminator begins to outperform the generator. Figure 8(a) and (b) highlight this observation further.

Output images depicting the noise and feature loss.
From Figure 8, it can be seen that the black brush strokes are still clear while the other colours such as blue, green, yellow etc. try to overfit the canvas resulting in high distortion A quantitative analysis was performed on the noisy image distribution by comparing it with the stable distribution. Table 4 contains the metrics used to support the qualitative argument. It is to note that for the SNR computation, the experiment considers noise as output of generator after epoch 275 and stable images at epoch 250 as signal. Signal to Noise ratio greater than 0 represents a strong signal power. The SNR proves the argument that the images after epoch 250 are highly distorted as noted qualitatively. In addition, L2 and L1 distances were computed between the two distributions. The values mentioned in Table 4 for the above two metrics show that the two distributions are dissimilar thereby proving the qualitative argument. It is to note that these are informal metrics to measure image quality and GAN performance.
Comparison between stable and erroneous brush stroke colour distribution.
To formalise the argument that there is a significant difference between the art space distribution before and after epoch 250, an F-Test is conducted to compare the variance of the two distributions. The test statistic for the samples i.e sample variance is computed for around 101 samples. Table 5 shows the sample mean and variance for both the distributions.
Sample mean and standard deviation.
Sample mean and standard deviation.
The Hypothesis is defined as follows let
Critical value and test statistic.
For a two tailed test, the rejection region is defined as
In this study, a modified version of DCGAN is proposed to generate abstract art, focusing on patterns involving colour combinations and brush strokes. The findings highlight a notable preference within the abstract art community for darker colour palettes, including black, dark red, purple, and deep blue. These darker hues are frequently accompanied by lighter shades in the overall colour scheme. Moreover, a series of random walk experiments were conducted to explore the latent space, which encompasses diverse multicoloured brush strokes. Qualitatively, this analysis revealed vector relationships between colours at both early and late stages of training. Furthermore, statistical analysis is applied to assess the stability and quality of GAN-generated outputs after the 250th training epoch. Metrics such as the Signal to Noise Ratio and distance between distributions, as well as hypothesis testing for variations in distribution variances, demonstrated significant distinctions between sample distributions before and after this critical training epoch. Future research endeavours would consider the utilisation of larger GAN architectures, such as StyleGANs, to achieve higher-resolution artistic outputs. Additionally, the exploration of techniques such as edge detection and gradient-based brush stroke analysis, as well as continued investigations into latent space exploration, hold promise for advancing this field.
Footnotes
Author contribution
Srinitish Srinivasan: Conceived and designed the research project, collected and analyzed the data, and wrote the initial draft of the manuscript. Varenya Pathak: Assisted in data collection and analysis, and contributed to the development of the experimental design. Abirami S: Conducted literature reviews, assisted in the development of the theoretical framework for the study, and helped in drafting various sections of the paper. A Sherly Alphonse: Designed and implemented the mathematical models and simulations used in the research, and was actively involved in the editing and proofreading process. S Abinaya: Provided expertise in statistical analysis and data interpretation, helped with statistical modelling, and played a key role in creating the figures and tables for the paper.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Availability of data and material
Publicly available dataset.
