DeepCreativity: measuring creativity with deep learning techniques

Abstract

Measuring machine creativity is one of the most fascinating challenges in Artificial Intelligence. This paper explores the possibility of using generative learning techniques for automatic assessment of creativity. The proposed solution does not involve human judgement, it is modular and of general applicability. We introduce a new measure, namely DeepCreativity, based on Margaret Boden’s definition of creativity as composed by value, novelty and surprise. We evaluate our methodology (and related measure) considering a case study, i.e., the generation of 19th century American poetry, showing its effectiveness and expressiveness.

Keywords

Computational creativity deep learning creativity measure American poetry

1 Introduction

Evaluation is a crucial concern in Artificial Intelligence and, more in general, in science. Measures and metrics are fundamental not only to check the validity of a hypothesis, but also to understand if some given results can be used with confidence as a starting point for future research. An example is Shannon’s entropy, which plays a central role as a measure of information, choice and uncertainty [20] and underpins many results in Information Theory [30]. The role of measures is even more crucial in machine learning, where models need to be evaluated during and after training. For example, most of the machine learning algorithms are evaluated using metrics like accuracy.

Excellent progress has been made in benchmark tasks coupled with metrics used to estimate the performance of algorithms [34]: examples include [32] for machine translation or [36] for image generation. At the same time, there is also a need to derive new metrics for examining the behavior of algorithms in different environments and in relation with society [34]. Among the spectrum of behaviors that could be exhibited by a machine, creativity is certainly one of the most interesting [30]. In fact, we have witnessed the emergence of an entire new field of research, namely Computational Creativity, with a focus on the study of the behaviors exhibited by artificial systems that would be deemed as creative [7, 48]. Indeed, one of the key goals of this field is the definition of computational techniques for measuring creativity.

In this paper, we present a novel methodology (and related measure) as a way to assess the creativity of an artifact. In particular, DeepCreativity is based on the very famous definition of creativity provided by Margaret Boden: “creativity is the ability to come up with ideas or artifacts that are new, surprising and valuable” [5]. Although it is not the only definition of creativity available (over one hundred of definitions have been proposed over the years [1, 42]), it is central in the field of computational creativity. The three aspects that are at the basis of this metric have played a prominent role in the scholarly discourse around the definition of machine creativity. Our proposed measure uses deep learning techniques, avoiding the need of a human in the process, to measure how much an artifact is valuable, novel and surprising with respect to a given context, and, therefore, to quantify the ability of an agent to come up with creative results. To the best of our knowledge, this is one of the first attempts to define an evaluation method for assessing the overall creativity of an artifact that is automatic and of general applicability.

This work is structured as follows: a review of the literature about automatic methods to assess creativity is presented in Section 2; then, in Section 3 we discuss the proposed creativity measure. An evaluation of DeepCreativity is presented in Section 4, considering a case study of text generation in the context of 19th century American poetry; finally, we discuss limitations of the proposed approach and potential future work in Section 5.

2 Related work

Over the years, several computational approaches have been proposed to automatically assess the creativity in products made by (human or artificial) agents, differing in the scope of evaluation or in the method. A complete survey can be found in [14]. All of them consider value and novelty as aspects of creativity, while only some of them also include surprise. In the following, we will consider the three factors separately.

2.1 Value

Value, sometimes referred as quality, expresses how an artifact compares to others in its class in terms of utility, performance or attractiveness. It is typically defined as a weighted sum of performance attributes or as a reflection of the acceptance of the artifact by society [25]. The authors of [11] follow the latter definition, which suggests to compute creativity by using an art graph where each vertex represents an artwork and each arc connecting an older to a newer work is labeled with the similarity between the two calculated by means of an appropriate similarity function. The higher the similarity with subsequent works, the higher the value (and the higher the creativity). However, this method does not allow to compute value for the most recent works, but only for the older ones. The former definition is more common in the literature. For instance, in [25] the authors suggest to derive value as the weighted sum of pre-defined performance variables. In [26], value is defined using clusters of artifacts built on a performance space – with artifacts expressed as sets of attribute-value pairs. The authors of [15] define it as the synergy [8] between artifacts, expressed following the regent-dependent model. Also several domain-specific methods follow the definition of value as the sum of performance attributes or performance measures: for example, for poetry generation, the authors of [50] consider topic distribution (through LDA), fluency (through a neural language model) and coherence (through mutual information and TF-IDF) as components of value. In [52] coherence is used (through BLEU, originally proposed for machine translation in [32]) with quality (through perplexity), while the authors of [51] uses BLEU only. However, the definition of value as the weighted sum of sub-components has the limitation of requiring the correct identification of all the relevant factors and their relative weights, which is a complex and time-consuming task.

2.2 Novelty

Novelty is commonly defined as the measure of how much an artifact differs from known artifacts in its class [25]. For this reason, a classic technique to measure novelty consists in the calculation of the distance between a given artifact and the other artifacts on a descriptive space, as discussed in [25] and [26]. The descriptive space is usually identified by the attributes used to define the artifacts. Similarly, domain-specific methods consider novelty in terms of distance or dissimilarity: for instance, in case of text generation, the authors of [21] consider novelty as the average semantic distance between the dominant terms included in the textual representation of the story, compared to the average semantic distance of the dominant terms in all stories. In [50] diversity and innovation in poetry generation are measured by means of bigram-based average Jaccard similarity. As for value methods, the requirement of defining artifacts in terms of attributes appears to be as a rather strong limitation.

A different definition of novelty has been proposed in [3], namely as the degree an input differs from what an observer has experienced before. In [11] novelty is defined by considering the time dimension of personal experience: the lower the degree of similarity between an artifact and the previous works, the higher the novelty contribution of creativity. Even if not exactly used as an evaluation technique, in [10] a novelty score is proposed to guide the training of the generative part of the Creative Adversarial Network. This can be considered as a creative-oriented variant of Generative Adversarial Networks (GANs) [16]. In addition to the classic adversarial loss provided by the discriminative model, the generator is trained to maximize a novelty loss that represents how much the generated artifact differs from previous works in terms of style. Although considering novelty as the deviation from style norms is somehow simplistic, it only requires a style classifier, automatically capturing an important aspect of novelty at the same time.

2.3 Surprise

In [3], surprise is defined as the degree of disagreement between the real input and what it was expected in its place. This classic definition of surprise based on unexpectedness is typically also referred to as surprisal [43]. In [25], unexpectedness is calculated considering whether or not the artifact follows the expected next artifact in the pattern recognized on recent artifacts. In [17], surprise is measured as the unlikelihood of observing a particular artifact according to the predictions about relationships between its attributes. In the specific domain of text generation, in [21], surprise is defined as the average semantic distance between consecutive fragments of each story. For sequential artifacts like texts or sounds, the authors of [6] adopt the expected maximum surprise as measure (as one minus the probability of the most unexpected token of the artifact) and the expected count of ψ-surprise (as the count of all the tokens for which predictability is lower than a given threshold $\frac{ψ}{K}$ ), where the expectations are provided by an audience neural network. In a similar way, [24] proposes to quantify surprise considering both the probability of the event X of interest and the probability of the most probable event Y, since the surprise of an event X also depends on the certainty of Y (e.g., ten equiprobable events have a very high unexpectedness, but they should have a very low surprise, since we are not surprised to see one of them occurring).

A quite different approach is adopted in [26], where the authors consider a new artifact as surprising if it creates a new cluster in the conceptual space (instead of perfectly fitting into an existing one). The idea of surprise as related with the difference between prior and posterior models is at the basis of Bayesian surprise [2], used in [15] and [46]. It is a measure of surprise in terms of the impact of a data point that changes a prior distribution into a posterior distribution, calculated applying Bayes’ theorem (considering artifacts as a composition of attributes); here, surprise is the post-observation change rather than the prediction error.

3 Measuring creativity using deep learning

We now present DeepCreativity, a new Deep Learning creativity measure. The goal is to define a measure of more general applicability. Deep Learning is used for avoiding the need of identifying the required attributes to describe the artifacts or the components of creativity [14]. This leads to a measure that allows for automatic evaluation of artifacts. As discussed in Section 1, DeepCreativity is based on the definition of creativity proposed by [5]. Therefore, the measure is based on three main factors, which will be explored in the next subsections separately: value (Subsection 3.1), novelty (Subsection 3.2), and surprise (Subsection 3.3). Finally, in Subsection 3.4, we will put everything together by providing a unified definition of creativity.

3.1 Value

We measure value by means of the discriminative part of a Generative Adversarial Network [16]. The GAN is trained by considering the real artifacts as the true ones; in this way, the discriminative model should learn a representation of real (and valuable) data, and its evaluation of a new artifact provides insights of its value in that context. Therefore, the value of an artifact a over the value discriminator D_v can be expressed as: $V (a, D_{v}) = D_{v} (a),$ (1) with V (a, D_v) naturally constrained between 0 (not valuable at all) and 1 (highly valuable), since a sigmoid activation is applied to the output layer of D_v.

The choice of the real artifacts influence the value measure proposed above (acting as an Inspiring Set [35]). While it can be seen as a limitation of the approach, it is highly coherent with the nature of creativity itself. Creativity and, in particular, value are deeply context-dependent: the same work, proposed in two different moments of history or to two different social groups may be evaluated differently [5]. Under this lens, the need of real artifacts conceals the opportunity of representing, within the measure, a fundamental aspect of creativity. The real data used during GAN’s training will therefore represent a specific context, well-defined in temporal and cultural terms.

To train the GAN, it is important to distinguish between continuous tasks (like image generation) and sequential tasks (like text or sound generation). With respect to continuous applications, a GAN can be trained using the following loss function [16]: $\begin{matrix} L & = min_{G} max_{D} V (D, G) = 𝔼_{x \sim p_{data} (x)} [log D (x)] \\ + 𝔼_{z \sim p_{z} (z)} [log (1 - D (G (z)))], \end{matrix}$ (2) with p_data as the real data distribution (representing the desired context), p_z as the input noise variable, and with discriminator D and generator G trained alternately; notice that several refinements have been proposed in the recent years (see [18] for potential variations).

As far as sequential applications are concerned, the impossibility of directly applying GAN to these tasks is a well-known problem [19]. A common way to solve it is by using SeqGAN [51]. SeqGAN considers the sequence generation process as a sequential decision making process, defining a reinforcement learning framework in which the generative model G_θ is the agent, the actual state (y₁, . . . , y_t-1) is composed by the generated tokens so far, the next action y_t is the next token to be generated, and the reward is the evaluation provided by the discriminative model D_φ. The generative model is then seen as a stochastic parametrized policy; Monte Carlo search is used to approximate the state-action value and directly train the policy via policy gradient [51]. More specifically, the REINFORCE algorithm [49] for learning the policy (but other methods can be used as well [12]), which leads to the following update rule: $θ \leftarrow θ + α Q_{D_{φ}}^{G_{θ}} (Y_{1 : t - 1}, y_{t}) \nabla_{θ} ln G_{θ} (y_{t} | y_{1 : t - 1}),$ (3) where Q is the expected return obtained by the N-time Monte Carlo search.

3.2 Novelty

With respect to novelty, our definition is inspired by CAN [10] although the deviation from style norms cannot be used directly to measure the difference between artifacts. Therefore, as additionally done by the CAN discriminator, a neural network D_n is trained to correctly recognize the style of real artifacts (from the given context). The neural network can just be a simple classifier (as in [31] for music or in [40] for paintings), outputting a probability vector of length N equal to the number of possible classes. Consequently, a novelty measure can be defined as: $\begin{matrix} N (a, D_{n}) & = 1 - \frac{\sqrt{\sum_{i = 1}^{N} {(\frac{1}{N} - y_{i})}^{2}}}{UB} \\ with UB = \frac{\sqrt{N (N - 1)}}{N}, \end{matrix}$ (4) where y is the output vector (of length N and sum 1) of D_n given in input artifact a. The formula computes the Euclidean Distance between y and the desired target vector of equiprobable values; in addition, it is constrained between 0 and 1, where it is equal to 1 when the distance is minimum (i.e., when the two vectors are equal) and it is equal to 0 when the distance is maximum (i.e., when a one-hot vector is considered). Please refer to Appendix A for the proof of this property.

3.3 Surprise

With respect to surprise, we follow the conceptual framework presented in [2]. Starting from a sequential generative model G_s trained to predict the next token given the previous ones on an appropriate training set (temporally and culturally defined, as stated for value), this allows for considering the impact of an artifact a over G_s. Its influence is calculated using a weight correction applied over G_s if G_s is trained to correctly predict a. In analogy with the Bayesian surprise, surprise is measured as the distance between prior G_s (before training) and posterior G_s (after training on a). The difference is in how the posterior distribution is obtained, namely not by means of Bayes’ theorem, but through backpropagation and gradient descent. Notice that this idea is very close to the intrinsic reward presented in [37], where a measure of surprise is derived by maximizing a distance function between prior and posterior distribution of a predictive model.

At inference time, only measuring surprise is relevant, while the model update is not actually required. It is only used to compute the weight correction Δw_ji, which expresses how much the posterior distribution will differ from the prior. Given an artifact a = {a₁, a₂, . . . , a_N}, the mini-batch (of size N) gradient descent formula for Δw_ji can be used: $Δ w_{ji} = - η \frac{1}{N} \sum_{k = 1}^{N} \frac{\partial J_{k}}{\partial w_{ji}},$ (5) where η is the learning rate and J_k is the loss function considering token k 1 .

We can now define the surprise measure more formally. Given a sequential generative model G_s, an artifact a has a surprise over G_s equal to: $S (a, G_{s}) = {avg}_{j, i} | \frac{Δ w_{ji}}{w_{ji}} | .$ (6)

We note that the correction is divided by the weight to represent the degree of correction, i.e., the influence of the artifact. Then, the learning rate in Equation (5) is not the learning rate used during training, but a parameter to adjust the magnitude of correction for the surprise measure. Even a value of 1 can be reasonable in certain problems. Finally, this approach requires G_s in order to consider artifacts as sequential data, even if they are continuous. In case of image, G_s may be, for instance, an autoregressive model [33 , 45].

3.4 Putting all together

Given the definition of V (a, D_v), N (a, D_n), S (a, G_s) in the previous subsections, the DeepCreativity measure (indicated with DC) is obtained by computing the creativity of a generative agent producing artifact a over a temporal and cultural context TCC as: $\begin{matrix} DC (a, TCC) = & α_{1} V (a, D_{v}) + \\ α_{2} N (a, D_{n}) + \\ α_{3} S (a, G_{s}), \end{matrix}$ (7) where α₁, α₂, α₃ ∈ [0, 1] and α₁ + α₂ + α₃ = 1. D_v, D_n and G_s are trained over TCC, which is a set of examples (x₁, y₁) , . . . , (x_n, y_n) where x₁, . . . , x_n are the real artifacts, and y₁, . . . , y_n are their labels representing the class (assuming N different values). α₁, α₂, α₃ weight the three single components of creativity; the immediate setting is to consider them as equal, as we will do in the following experiments. Nonetheless, it is possible to change them according to the specific domain, if some of the properties are found as more relevant in creativity assessment.

4 Experiments

There is no common agreement about how to evaluate creativity measures. All the methodologies discussed in Section 2 have not been evaluated against a ground truth; on the contrary, they have just been tested over a generative system, in comparison with human judgements (always considering the products of a generative system) or they have not been tested at all. This can be attributed to the difficulty of finding a common definition of creativity, which is reflected in the lack of correct evaluation of creative productions.

However, a ground truth about this creativity process exists in this case: art history. The fact that in a certain moment of history, in a certain place, an artwork was appreciated or at least considered of sufficient quality to be “printable” may be used as useful information for evaluating a creative agent. Inspired by considerations done in [30] about CAN and its ability of intercepting the historical trajectory of art, a meta-evaluation test is defined, based on historical trajectories, to study if and how the proposed measure is able to correctly capture the changes of creativity over time in a fixed culture. In particular, the following experiment will concern the context of American Poetry. American poetry has been chosen because of the importance of this artistic production and its heterogeneity in terms of movements and styles, while still referring to a specific cultural context. Other cultures or artistic fields could have been selected as well; while the study presented in this work aims at demonstrating the effectiveness of DeepCreativity in principle, we are aware that further work is necessary to demonstrate its generality (see Section 5).

The goal of this experiment is to measure the creativity of poems from different moments of history, while training the neural networks for the computation of DeepCreativity on a specific historical context. DeepCreativity can therefore be considered an appropriate creativity measure if the resulting creativity is higher for the artworks which really come after the context, because these are the ones been considered as highly creative in that moment. Consequently, it should also recognize the other works as less creative: later works should be judged more novel and surprising, but less valuable and understandable; while contemporary works should be judged more valuable but less novel and surprising.

Two separated experiments are conducted to verify if the measure is able to capture creativity for a certain period of time. Both of them involve poems from the 19th century (i.e., from American Renaissance (Brahmins and Romantics), Local Color, Naturalism, and Neogothic (or Protodecadentism)) as the context. The first experiment involves poems from the 20th century (i.e., a selection of poems from Imagism, Harlem Renaissance, Objectivism, Beat Generation, and Confessional Movement). The second one also considers poems from the 17th and the 18th century (i.e., a selection of poems from Puritanism, African-American Poetry, and American Revolution). A sample of poems from the training set is always considered for a complete comparison; full details are reported in Appendix C. As far as details about the implementation are concerned, two types of NNs have been used: a LSTM-based RNN for the generative models, and a CNN for the discriminative models. Full details about architectures and training processes can be found in Appendix B.

With respect to the first experiment, Fig. 1 shows the average of the creativity components during movements and the final creativity measure. It is interesting to note that the higher the novelty the further from the training set. This correctly captures the fact that a movement, which immediately follows a certain period has to be novel with respect to it. Moreover, the next movement has to be novel with respect to both the works produced in that period and the first one. The surprise curve generally also shows a similar behavior: temporally distant artifacts are the result of different contexts and different situations and they are more difficult to be predicted only considering a past version of the same culture. The last movement, the Confessional one, could be considered as an exception. This can be explained by considering how surprise is measured: it is calculated as the degree of change that the work causes over a 19th century American poems model, which is strictly related to a semantic view of the context, because it is based on the content. Indeed, temporally far movements might have a lower surprise measure if their themes (e.g., love) are semantically closer to those in the training set. The same consideration can be done to explain the value curve. For the first four movements, it tends to decrease with time, as expected. On the other side, Confessional Movement has a higher value; since its semantic content is closer to the one from the context, it results in a more similar and therefore comprehensible and admirable style, with a higher value.

Fig. 1

The average of value, novelty, surprise and creativity computed on a sample from the training set and on 20th century American poems.

In general, it is possible to observe that creativity tends to decrease further in time from the period of reference of the training set, while it is higher for the central movement, which is able to conciliate a high degree of surprise without a consistent loss in value.

With respect to the second experiment, Fig. 2 shows the three components also considering previous movements. This should help study the appropriateness of the three measures over the time dimension. It is therefore interesting to note that two over three curves follow the same trends observed for the subsequent century. Novelty is the only one behaving differently, since the closest movement has more or less the same novelty than the furthest. On the contrary, value decreases further away from the period of reference of the training set, as desirable; in the same way, surprise increases. In addition, it is interesting to note that surprise is smaller than for the 20th century on average. This can be explained by observing that the 19th century poems include some knowledge about the previous poems, making them more predictable.

Fig. 2

The average of value, novelty, surprise and creativity computed on a sample from the training set and on both 18th and 20th century American poems.

5 Conclusion and future work

In this work, we have introduced DeepCreativity, a new creativity measure based on three components with the objective of measuring the value, novelty and surprise of a generative process or algorithm, in terms of their products. This general approach overcomes the limits of having measures applicable only to certain domains; in addition, the use of deep learning techniques overcomes the limits of having to manually define the attributes or the components which characterize creativity. Moreover, the automation of evaluation allows for embedding DeepCreativity in the creative generation itself; for instance, a generative model can be trained to learn to maximize such a measure [4] or it can be used inside a generate-and-test iterative process [41]. Finally, the need of a training set allows for the definition of a specific context of evaluation, which has been found to be a fundamental constraint of creativity. However, few limitations can also be found: novelty only considers the style or the genre, while it might lie in other traits of a work; surprise requires a sequential generator, which could be not optimal for (supposedly simpler) continuous tasks.

The experiments conducted in the context of generative learning of 19th century American poetry have demonstrated that the measure is able to capture the historic trajectory of creativity over time, either only considering future poems or also previous ones, showing its effectiveness. Additional tests should be carried out in order to confirm the correctness and the general applicability of the measure, ideally in different domains; contemporary contexts should be considered too, in order to have the evaluation of DeepCreativity validated against human judges (as in [10]). This is part of our future research agenda.

Footnotes

Appendix

It is worth noting that the loss function represents in a way the expectation error, i.e., the surprisal.

References

Aleinikov

A.G.

, Kackmeister

, Koenig

, Creating Creativity: 101 Definitions (what Webster Never Told You). Alden B. Dow Creativity Center Press, 2000.

Baldi

and Itti

, Of bits and wows: a bayesian theory of surprise with applications to attention, Neural Networks: The Official Journal of the International Neural Network Society 23 (2010), 649–666.

Berlyne

D.E.

, Aesthetics and Psychobiology. Appleton-Century-Crofts, 1971.

Berns

and Colton

, Bridging Generative Deep Learning and Computational Creativity. In ICCC 2020, pages 406–409, 2020.

Boden

M.A.

, The Creative Mind: Myths and Mechanisms. Routledge, 2003.

Bunescu

R.C.

and Uduehi

O.O.

, Learning to Surprise: A Composer-Audience Architecture. In ICCC 2019, pages 41–48. Association for Computational Creativity (ACC), 2019.

Colton

and Wiggins

G.A.

, Computational Creativity: The Final Frontier? In ECAI 2012, 2012.

Corning

, Nature’s Magic: Synergy in Evolution and the Fate of Humankind. Cambridge University Press, 2003.

Duchi

, Hazan

and Singer

, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research 12 (2011), 2121–2159.

10.

Elgammal

, Liu

, Elhoseiny

and Mazzone

, CAN: Creative Adversarial Networks, Generating “Art” by Learning About Styles and Deviating from Style Norms. In ICCC 2017, 2017.

11.

Elgammal

and Saleh

, Quantifying Creativity in Art Networks. In ICCC 2015, pages 39–46, 2015.

12.

Fedus

, Goodfellow

and Dai

A.M.

, MaskGAN: Better Text Generation via Filling in the _______. In ICLR 2018, 2018.

13.

Foster

, Generative Deep Learning. O’Reilly, 2019.

14.

Franceschelli

and Musolesi

, Creativity and Machine Learning: A Survey, 2021. arXiv:2104.02726 [cs.LG].

15.

França

, Goes

L.F.W.

, Amorim

, Rocha

R.C.O.

and Da Silva

A.R.

, Regent-Dependent Creativity: A Domain Independent Metric for the Assessment of Creative Artifacts. In ICCC 2016, 2016.

16.

Goodfellow

, Pouget-Abadie

, Mirza

, Xu

, Warde-Farley

, Ozair

, Courville

and Bengio

, Generative Adversarial Nets. In NeurIPS 2014, pages 2672–2680. Curran Associates, Inc., 2014.

17.

Grace

and Maher

M.L.

, What to expect when you’re expecting: The role of unexpectedness in computationally evaluating creativity. In ICCC 2014, 2014.

18.

Gui

, Sun

, Wen

, Tao

and Jie-Ping

, A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications, 2020. arXiv:2001.06937 [cs.LG].

19.

Huszár

, How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary? 2015. arXiv:1511.05101 [stat.ML].

20.

Jha

, Without Claude Shannon’s information theory there would have been no internet. The Guardian. 2016. https://www.theguardian.com/science/2014/jun/22/shannon-information-theory

21.

Karampiperis

, Koukourikos

and Koliopoulou

, Towards Machines for Measuring Creativity: The Use of Computational Tools in Storytelling Activities. In ICALT 2014. IEEE, 2014.

22.

Kim

, Convolutional Neural Networks for Sentence Classification. In EMNLP 2014, pages 1746–1751, 2014.

23.

Kingma

D.P.

and Ba

, Adam: A Method for Stochastic Optimization, 2014. arXiv:1412.6980 [cs.LG].

24.

Macedo

, Reisenzein

and Cardoso

, Modeling forms of surprise in artificial agents: empirical and theoretical study of surprise functions. In CogSci 2004, pages 873–878, 2004.

25.

Maher

, Evaluating creativity in humans, computers, and collectively intelligent systems. In DESIRE 2010, pages 22–28, 2010.

26.

Maher

and Fisher

, Using AI to Evaluate Creative Designs. ICDC 2012, 1, 2012.

27.

Margoni

, Artificial Intelligence, Machine Learning and EU Copyright Law: Who Owns AI? CREATe Working Paper, 2018.

28.

Meemulla Kandi

, Language Modelling for Handling Out-of-VocabularyWords in Natural Language Processing. PhD thesis, London School of Economics and Political Science, 2018.

29.

Mikolov

, Chen

, Corrado

and Dean

, Efficient Estimation of Word Representations in Vector Space, 2013. arXiv:1301.3781 [cs.CL].

30.

Miller

A.I.

, The Artist in the Machine. The MIT Press, 2019.

31.

Nam

, Choi

, Lee

, Chou

S.-Y.

and Yang

Y.-H.

, Deep Learning for Audio-Based Music Classification and Tagging: Teaching Computers to Distinguish Rock from Bach, IEEE Signal Processing Magazine 36(1) (2019), 41–51.

32.

Papineni

, Roukos

, Ward

and Zhu

W.-J.

, BLEU: A Method for Automatic Evaluation of Machine Translation. In ACL 2002, pages 311–318. Association for Computational Linguistics, 2002.

33.

Parmar

, Vaswani

, Uszkoreit

, Kaiser

, Shazeer

, Ku

and Tran

, Image Transformer. In PMLR 2018 80 (2018), 4055–4064.

34.

Rahwan

, Cebrian

, Obradovich

, Bongard

, Bonnefon

J.-F.

, Breazeal

, Crandall

J.W.

, Christakis

N.A.

, Couzin

I.D.

, Jackson

M.O.

, Jennings

N.R.

, Kamar

, Kloumann

I.M.

, Larochelle

, Lazer

, McElreath

, Mislove

, Parkes

D.C.

, Pentland

, Røberts

M.E.

, Shariff

, Tenenbaum

J.B.

and Wellman

, Machine behaviour, Nature 568 (2019), 477–486.

35.

Ritchie

, Some empirical criteria for attributing creativity to a computer program, Minds and Machines 17 (2007), 67–99.

36.

Russakovsky

, Deng

, Su

, Krause

, Satheesh

, Ma

, Huang

, Karpathy

, Khosla

, Bernstein

, Berg

A.C.

and Fei-Fei

, ImageNet large scale visual recognition challenge, International Journal of Computer Vision 115 (2015), 211–252.

37.

Schmidhuber

, Formal theory of creativity, fun, and intrinsic motivation (1990–2010), IEEE Transactions on Autonomous Mental Development 2(3) (2010), 230–247.

38.

Schmidt

R.M.

, Schneider

and Hennig

, Descending through a Crowded Valley – Benchmarking Deep Learning Optimizers. In ICML 2021 139 (2021), 9367–9376.

39.

Shannon

C.E.

, A mathematical theory of communication, The Bell System Technical Journal 27(3) (1948), 379–423.

40.

Tan

W.R.

, Chan

C.S.

, Aguirre

H.E.

and Tanaka

, Ceci n’est pas une Pipe: A Deep Convolutional Network for Fine-Art Paintings Classification. In ICIP 2016, pages pages 3703–3707, 2016.

41.

Toivonen

and Gross

, Data mining and machine learning in computational creativity, WIREs Data Mining and Knowledge Discovery 5(6) (2015), 265–275.

42.

Treffinger

D.J.

, Creativity, Creative Thinking, and Critical Thinking: In Search of Definitions, Center for Creative Learning, 1996.

43.

Tribus

, Thermodynamics and Thermostatics: An Introduction to Energy, Information and States of Matter, with Engineering Applications. Van Nostrand, 1961.

44.

Van Den Oord

, Kalchbrenner

and Kavukcuoglu

, Pixel Recurrent Neural Networks. In ICML 2016, page 1747–1756. JMLR.org, 2016.

45.

Van Den Oord

, Kalchbrenner

, Vinyals

, Espeholt

, Graves

and Kavukcuoglu

, Conditional Image Generation with PixelCNN Decoders. In NeurIPS 2020, page 4797–4805. Curran Associates Inc., 2016.

46.

Varshney

L.R.

, Pinel

, Varshney

K.R.

, Bhattacharjya

, Schoergendorfer

and Chee

Y.-M.

, A big data approach to computational creativity: The curious case of Chef Watson, IBM Journal of Research and Development 63(1) (2019), 7:1–7:18.

47.

Weaver

and Tao

, The Optimal Reward Baseline for Gradient-Based Reinforcement Learning. In UAI 2001, page 538–545, 2001.

48.

Wiggins

G.A.

, Searching for computational creativity, New Generation Computing 24 (2006), 209–222.

49.

Williams

R.J.

, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning 8 (1992), 229–256.

50.

, Sun

, Li

and Li

, Automatic Poetry Generation with Mutual Reinforcement Learning. In EMNLP 2018, pages 3143–3153. Association for Computational Linguistics, 2018.

51.

, Zhang

, Wang

and Yu

, SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. In AAAI 2017, page 2852–2858. AAAI Press, 2017.

52.

Zhang

and Lapata

, Chinese Poetry Generation with Recurrent Neural Networks. In EMNLP 2014, pages 670–680. Association for Computational Linguistics, 2014.