Generate custom travel magazine layouts

Abstract

Among the problems of specifying the style and number of elements of a travel magazine, the problem of generating magazine layout by constraining text, and constraining graph layout remains a complex and unsolved problem. In this paper, we generate layouts of text satisfying constraints via GAN. Due to the complexity and variety of graph designs, we enhance the performance of the discriminator and the generator so that the layouts generated by the generator are more constrained. Add non-corresponding constraint text and real layout pairs to the discriminator to enhance the performance of the discriminator; then add a spatial attention mechanism to the layout encoder to extract the features of the layout and generate high-quality layouts. We demonstrate that the proposed method can generate high-quality layouts of text satisfying the constraints, and we validate the effectiveness of this method through user ratings.

Keywords

Layout generative adversarial network layout design customization

1. Introduction

Advertisements in travel magazines have gradually grown up along with the development of tourism and advertising in China. As we all know, the product that tourism operators sell to tourists is the travel experience, most of which are intangible products that cannot be seen, touched, or taken away [1]. How to effectively convey commodity information (tourism products and destination information) to such customers (tourists) is more difficult than ordinary commodities, so the development and prosperity of the tourism industry depend to a large extent on the support of advertisements [2]. The development practice of the tourism industry also fully proves that travel magazine advertisements are an important way to show tourists the image of a tourist destination and the charm of tourism products. In travel magazine advertisements, the layout is the basis of travel magazine advertisement design. The layout design of print ads is very important in creating multimedia, and it has a wide range of applications in advertising, books, and websites. A good design is visually pleasing while conveying effective information, but high-quality travel magazine ad design is highly dependent on experienced designers [3]. When designing a magazine, the designer often needs to create a layout that combines multiple elements, and sometimes needs to design a magazine advertisement with a similar style but a different layout. When designing a magazine advertisement, the designer usually first has the materials for designing the magazine advertisement (such as how many pictures can be used in total, how many text descriptions, etc.), and then designs the travel magazine according to the number of materials, the size of the pictures and the satisfactory design style. Advertising takes a lot of time. There are many possibilities for organizing a given set of elements into a layout. A good layout must meet design goals (such as reading order, alignment between elements, etc.), and changing one element may require reorganizing many others [4].

In this work, our objective is to address the problem of generating layouts for tourism magazine advertisements based on user input constraints, such as the number of available design materials or layout style preferences. As shown in Fig. 1, we aim to create high-quality tourism magazine advertisement layouts that satisfy the given constraints (This layout is a three-column composition, one red area, three yellow areas, one green area). Previous research has started to employ learning-based methods to tackle the task of magazine advertisement layout generation and achieved some success in generating natural realistic images [5]. However, this field has not been extensively explored. Existing approaches mainly rely on templates or heuristic rules for layout generation [6]. However, these existing methods have limitations in handling relationships between elements, such as interdependencies and alignment or overlapping between elements. Moreover, considering user style preferences and constraints, utilizing generative models to produce effective layouts also presents challenges. Therefore, further research and development are necessary to effectively capture element relationships and user constraints using learning methods to solve the problem of generating tourism magazine advertisement layouts.

Fig. 1.

Drawing layout generated according to user constraints and example design based on generated layout construction.

Therefore, this paper proposes a way to generate a customized layout for travel magazines. Design tourism magazine advertising layout based on user input constraints.

In summary, we make the following contributions to this work:

•

The spatial attention mechanism technology introduced in the GAN network, and the characteristics of the layout image are learned through the cyclic neural network, which improves the performance of the generator and makes the generated layout higher quality.

•

Introduce a mixer in the GAN network, add a third type of input to the discriminator, consisting of mismatched text and real layout, improve the performance of the discriminator, and optimize the constraints between layout and text by learning relation.

2. Related work

Travel magazine advertisements are a combination of image and text superimposition, and present text information and image through certain typesetting [7]. Travel magazine advertisement is a kind of graphic design, which is usually used to show the scenery of scenic spots and convey information about scenic spots to tourists, to promote tourist attractions. Creating high-quality travel magazine ads can be difficult and time-consuming when designers are faced with a plethora of design choices. When providing image and text descriptions of specific attractions, designers usually need to explore a large number of layout styles, and then design travel magazine advertisements according to the layout styles, not to mention that sometimes designers need to produce a large number of magazine advertisements, so it is proposed to intelligently generate tourism the study of magazine advertisements can save designers a lot of time in layout design, and also allow non-design professionals to design high-quality, harmonious and beautiful travel magazine advertisements. The most important thing in the design of a travel magazine advertisement is the layout of each element (title, subtitle, image, text, etc.). The layout is at the heart of graphic design, including magazines, posters, comics, and web pages [8]. A high-quality layout can facilitate the presentation of the information while capturing the reader’s attention and enhancing the visual impact of a travel magazine ad. In recent years, the problem of graphic design layout has received increasing attention from the graphics community. Some previous work attempts to model graphic design layouts to generate layouts guided by style, perception, and aesthetics. Rich layout variations in graphic design are largely driven by the visual and textual content to be presented [9]. Automatic layout generation is a major research hotspot in the field of graphic design [10, 11]. In automatic layout generation, grid layout is used as a design principle in many toolkits and layout managers, which provide interactive aids such as grid-snapping and automatic alignment [12]. Hosobe et al.’s [13] address nonlinear geometric constraints such as Euclidean geometry, non-overlapping, and graph layout constraints, and they also discuss soft constraints with hierarchical dominance or preference. Lutteroth et al.’s [14] proposed an ALM layout model, a constraint-based technique for specifying 2-D layouts. Hosobe et al.’s [15] proposed a hierarchical method for solving soft nonlinear constraints, which uses hierarchical preferences to handle soft nonlinear strains and computes a solution that satisfies as many constraints as possible and strong preferences, which adopts the method of Lagrange multipliers can compute the local solution exactly. Dayama et al.’s [16] proposed an interactive integer programming layout design. By inputting the size of each element, a mixed integer linear programming method was used to make each element non-overlapping, thereby designing the layout of a magazine. A graphical user interface is designed using a grid layout, the spatial structure is defined by grid lines, and grid lines guide the size and position of GUI elements to simplify the layout design [17]. Grid-based layout design is mostly manual, and the computer just assists each module to be as beautiful as possible without overlapping. Damera-Venkata et al.’s [18] proposed to model the relationship between page elements, solving the document arrangement problem in Bayesian network probabilistic reasoning. In addition, an automatic layout design system is also popular in the automatic typesetting layout, such as a magazine generated by some preferences set by the user. Li J et al.’s [19] proposed Layout-GAN to synthesize layouts by modeling the geometric relationships of different types of 2D elements. Lee et al.’s [6] propose a design layout generation method that satisfies user-specified constraints. A visual text representation layout with predefined layout templates and aesthetic design principles is designed [20]. Biswas et al.’s [21] proposed a method for the automatic synthesis of document images based on a given layout. Guo et al.’s [22] designed the Vinci system to use a deep generative model to match a product image with a set of design elements and layouts to generate aesthetically pleasing posters. Li C et al.’s [23] proposed an efficient deep aesthetic learning method to generate harmonious text layouts on natural images. Li J et al.’s [24] introduce attribute-conditional layout GANs to incorporate attributes of design elements into graph layout generation by enforcing the generator and discriminator to satisfy attribute conditions. Arroyo D et al.’s [25] propose to exploit the properties of self-attention layers to capture high-level relationships between elements in a layout and use these as building blocks of the well-known variational autoencoder (VAE) formulation. Kikuchi K et al.’s [26] propose a generative layout model built on the transformer architecture and formulate layout generation as a constrained optimization problem, where constraints are designed for element alignment, overlap avoidance, or any other user-specified relationship. Zhou M et al.’s [27] proposed a deep generative model called composition-aware graphic layout GAN (CGL-GAN), which synthesizes layouts based on input images’ global and spatial visual content.

3. Our model

Since the ultimate goal of this paper is to determine the layout of travel magazine advertisements through the user input design style or the number of specific elements, the layout of magazine advertisements is generated through the model of Fig. 2. Each layout $p_{i}$ is described by nine short sentences $T_{i}=\{{t_{1},t_{2},t_{3},\ldots,t_{9}}\}$ , and $T_{i}$ is the constraint text of $p_{i}$ , and then $T_{i}$ is converted into a vector $Y_{i}$ by a text encoder; $p_{i}$ generates vector $X_{i}$ through the layout encoder, $p_{i}^{\prime}$ generates vector $X_{i}^{\prime}$ through the layout encoder, a piece of gaussian distribution noise Z and vector $Y_{i}$ is input into the generator G to generate the layout $p_{i}^{\prime}$ and the generated layout and the real layout is input into the discriminator D. And let the discriminator determine whether the input layout is generated (G) or real (R).

Fig. 2.

The framework of our model.

3.1 Text encoder

The text data is processed, and each text is converted into a matrix in the form of one-hot encoding, and passed through an embedding and an RNN layer, as shown in Fig. 3.

Fig. 3.

Text encoding model diagram.

Text encoding is to learn text features from text content and use the learned text features to guide the generation of magazine ad layouts. The text label of each layout is composed of nine phrases, which are layout style, text proportion, picture proportion, title proportion, background proportion, title proportion on pictures, and text proportion on picture, it consists of nine pieces of information, including the total element quantity and the specific quantity of a single element. The constraint text of each travel magazine advertisement layout is converted into a matrix through one-hot, and then through the word embedding layer (embedding), the output of this layer is s, and finally through the recurrent neural network (RNN) layer, the output vector y. As shown in Fig. 3.

$\displaystyle y=\textit{RNN}(s).$ (1)

Where $x=\{x_{i}|i=1,2,\ldots n\ldots,m\}$ , $n$ is the number of phrases in the constraint text of the travel magazine advertisement layout, if the number of words is less than $m$ , fill in the gaps with zeros; $s=\{s_{1},s_{2},s_{3},\ldots s_{m}\}\in R^{D_{1}\times m}$ , $D_{1}=256$ is the dimension of $s$ ; $y=\{y_{i}|i=1,2,\ldots,m\}\in R^{D_{2}\times m}$ , $D_{2}=128$ is the dimension of $y$ .

3.2 Layout encoder

Attention-based models have gained popularity in a variety of computer vision and machine learning tasks, including neural machine translation, image classification, image segmentation, image and video captioning, and visual question answering. Attention improves the performance of all these tasks by encouraging the model to focus on the most relevant parts of the input.

Fig. 4.

Layout coding model diagram.

In the layout encoder, the input is the real layout P, and the output is the feature vector X of the layout. The shape of each layout is (64 $\times$ 45 $\times$ 3). First, the shape of the layout is converted to (64 $\times$ 64 $\times$ 3) by filling 0, and the spatial attention mechanism (SAM) is added between the convolutional layers to enhance the network. The ability to extract images to achieve the effect of enhanced layout features. As shown in Fig. 4.

$\displaystyle z_{l}=\frac{1}{H\times W}\mathop{\sum}\limits_{i=1}^{H}\mathop{% \sum}\limits_{j=1}^{W}p_{l}({i,j})$ (2) $\displaystyle s_{l}=\sigma({g({z_{l},W})})=\sigma({W_{2}\sigma({W_{1}z_{l}})})$ (3) $\displaystyle\tilde{p}_{l}=s_{l}p_{l}$ (4) $\displaystyle X_{l}=\textit{CNN}(\tilde{p}_{l})$ (5)

The input $P=[{p_{1},p_{2},\ldots,p_{l}}]$ , $x\in R^{H\times W}$ , P is a layout picture group composed of $p_{1},p_{2},\ldots,p_{l}$ , $z_{l}\in R^{C}$ , H and W respectively represent the height and width of the layout image, C represents the number of image channels, $\sigma$ refers to the ReLU function, and Eq. (3) is two fully connected layers, to prevent the model from being complicated. $W_{1}\in R^{\frac{C}{r}\times C}$ , $W_{2}\in R^{C\times\frac{C}{r}}$ , represent the weight matrix respectively, where $r$ is the reduction ratio, which changes the capacity and calculation cost of the network. Hyperparameters, $s_{l}$ is the intermediate hidden state of the SAM. $\tilde{P}=[\tilde{p}_{1},\tilde{p}_{2},\ldots,\tilde{p}_{l}]$ . The obtained $\tilde{P}$ is input into the CNN network to learn to capture the global and local behaviors of elements respectively, and the final output is $X=[{X_{1},X_{2},\ldots,X_{l}}]$ .

3.3 Mixer

Fig. 5.

Mix and shuffle constraint text and layout.

The input in the discriminator is the constraint text and the layout pair $({T,P})$ , and let the discriminator judge whether P in the input $({T,P})$ is the generated layout or the real layout. However, the early identification performance of such a method will be poor, and the training time will be longer. The discriminator may ignore the constraint information T, and directly judge the generated layout $P^{\prime}$ as the generated layout, but it is not consistent with the constraint text T correspondingly, whether the constraint condition text T is satisfied. Therefore, adding a third type of input synthesized by the mixer in the discriminator can improve the performance of the discriminator, and can also learn to optimize the constraint relationship between the layout and the constraint text. The role of the mixer is to pair the real travel magazine advertisement layout with its unmatched constraint text and let the discriminator identify whether the real travel magazine advertisement layout satisfies the constraint text of the real travel magazine advertisement layout. So, there are three kinds of inputs to the discriminator: the ground truth layout and constraint text with matching constraint text, the generated layout and arbitrary constraint text with arbitrary constraint text, and the ground truth layout and constraint text with non-matching constraint text. The mixer is to make the real constraint text T and the real layout P form a text and layout pair $({T_{i},P_{j}})$ that do not match each other, where $i$ and $j$ can be equal, and if $i=j$ is the input belongs to the first case in the discriminator. If $i\neq j$ , the input belongs to the third case in the discriminator, as shown in Fig. 5 so introducing a mixer enhances the performance of the discriminator.

3.4 Generative adversarial networks

The generation confrontation network is composed of a generator G and a discriminator D. The dual system of GAN is to let the generator try to confuse the discriminator and simultaneously judge the source of the input image as possible. The two models’ relationship between them is against each other, and they both make themselves better by trying to beat each other. The generator can get feedback on whether the image it generates is consistent with the image distribution of the dataset through the discriminator, and the discriminator can get more training samples through the generator. So $D$ and $G$ play a game in $V({D,G})$ .

$\displaystyle\mathop{\min}\limits_{G}\mathop{\max}\limits_{D}V({D,G})=E_{x\sim p% _{\textit{data}}(x)}[{\log D(X)}]+E_{z\sim p_{z}(z)}[{\log({1-D({G(z)})})}]$ (6) $\displaystyle D(x)=\frac{p_{\textit{data}}(x)}{p_{\textit{data}}(x)+p_{g}(x)}$ (7)

Goodfellow I et al.’s [28] proved that this minimax game has global optimality exactly when $p_{g}=p_{\textit{data}}$ , and $p_{g}$ converges to $p_{\textit{data}}$ under mild conditions (e.g. $G$ and $D$ have enough capacity). In practice, at the beginning of training, samples from $D$ are very poor and are rejected by $D$ with high confidence. It has been found that in practice the generator works better by maximizing $\log({D({G(z)})})$ rather than minimizing $\log({1-D({G(z)})})$ .

In the generator $G$ , the first sample from the noise $z\in R^{Z}\sim N({0,1})$ of the Gaussian distribution, input z and y to the generator, and finally the generator outputs the generated layout $p^{\prime}$ , whose condition is the text T and noise samples z.

In discriminator $D$ , multi-layer convolutions are performed using spatial batch normalization followed by ReLU. The text T is reduced into the fully connected layer and then rectified. When the spatial dimension of the discriminator is $4\times 4$ , text embeddings are spatially replicated and deep connections are performed. A $1\times 1$ convolution is then performed, followed by rectification and a $4\times 4$ convolution to compute the final score for $D$ . And batch normalization is performed on all convolutional layers.

$\displaystyle{\cal L}_{D}=\log({D({Y,X})})+\frac{1}{2}({\log(1-D({Y,{X}^{% \prime}}))}+\log(1-D(Y^{\prime},X)))$ (8) $\displaystyle D\leftarrow D-\alpha\frac{\partial{\cal L}_{D}}{\partial D}$ (9) $\displaystyle{\cal L}_{G}=\log({D({Y,X^{\prime}})})$ (10) $\displaystyle G\leftarrow G-\alpha\frac{\partial{\cal L}_{G}}{\partial G}$ (11)

Equations (9) and (11) use the gradient step size to update the generator G and discriminator D network parameters. Equations (8) and (10) are the loss functions of the discriminator and the generator, respectively, where $D(Y^{\prime},X)$ represents the unmatched text and layout pairs.

3.5 Generate layout optimization

Fig. 6.

The above is the layout generated by the generator, and the bottom is the optimized layout.

As we all know, the GAN network has difficulty in reconstructing fine vision, but this does not affect us too much, because we generate a layout structure rather than a clear enough photo, but the pictures generated by the GAN network still have noise points, the element boundary in the layout will not directly generate a standard rectangle, so we need to refine it, as shown in Fig. 6. The specific process is as follows: First, the internal noise points in each label after semantic segmentation are identified by color recognition technology, and then the internal noise points are filled with the color of the corresponding label. Next, remove boundary noise points for each label after removing internal noise points. Finally, correct the boundaries of each label after removing the boundary noise points, as follows: use the function to obtain the point set of the four boundaries of the color area, then find the average point of the four boundaries, and obtain the coordinate value of the vertical axis from the average points on the upper and lower sides, the horizontal axis coordinate value is obtained from the mean points on the left and right sides, and the obtained vertical axis coordinate value and horizontal axis coordinate value are combined to form four point coordinates as four vertices. The boundary of the rectangular area surrounded by the four vertices is the color of the boundaries of the region.

4. Experiments

We use three different methods for comparison: GRIDS, Ground Truth (GT), Baseline. Baseline: We use a basic GAN network, only the discriminator and generator are not added, and the text encoder uses the same text encoder as us, while the layout encoder uses a single CNN.

4.1 Data set

Training the model in this paper requires the layout of real travel magazine advertisements, but there are only travel magazines, so semantic layout annotations are required for travel magazines.

Use the python-based scrappy crawler framework to create a travel magazine advertisement search project, collect 682 travel magazine advertisements from travel websites, filter out some magazine advertisements that do not meet the requirements and are vague, and the remaining 626. Split it into a training set (70%) and a test set (30%). Set up six types of labels: travel magazine ad text, travel magazine ad title, picture, title on the picture, text on the picture, and background, respectively represented by yellow, green, red, purple, blue, and gray areas. The layout effect is shown in Fig. 7 this article distinguishes heading elements from other text elements because heading elements play an important role in graphic design layout. Then, manually divide a part of the travel magazine advertisements obtained in step 1 into six types of labels as the training set, and train the fully convolutional neural network (FCN). Then, through the trained fully convolutional neural network, the remaining travel magazine advertisements are divided into six categories of labels through semantic segmentation. Due to the relatively small amount of data in the training set, data enhancement techniques (including random resize, random horizontal flip, and random cropping) are used to enhance the training set. Each label after semantic segmentation is refined, and the method is the same as the layout optimization in Section 3.5.

We use color recognition technology (such as Python’s OpenCV package) to identify the four vertex coordinate values of each color area. Through these four coordinate values, we can calculate the proportion of various labels in the tourism magazine advertising layout. We can also identify the number of color areas in each layout, and calculate which types of labels each layout is composed of, as well as the number of various labels. Through the investigation of the layout classification, it is found that the layout can be divided into seven types of composition, which are the circular composition, the palace-style composition (including the four-square grid composition, the six-square grid composition, and the nine-square grid composition), the left-right symmetrical composition, and the three-column composition, combined composition, split composition (where one tab occupies most of the layout), and two-column composition. Manually classify each layout according to seven layout styles. The semantic information of the layout is expressed through the nine sentences in Table 1 as the constraint text of the layout. Table 1 is a template for the constraint text.

Table 1
Constraint text template

Layout text conditions
The specific composition type of the layout
Percentage of travel magazine ad text area
The proportion of the image area
The proportion of travel magazine ad headline area
The proportion of the background area
The proportion of the title area on the image
The proportion of text area on the image
The label classes that makeup the layout
The number of each label except the background

When using the test set to evaluate the model, the data input is the nine constraints of the layout, and the output is the layout image. In user evaluation experiments, the constraints on user input are the style of the layout and the number of elements in the layout. The proportion of each element in the layout is our input when training the model.

Fig. 7.

Example of magazine layout.

4.2 Ablation experiment

To better capture the local information of the layout, such as the regularity of the boundaries of each element in the generated layout, the boundaries between the generated layout elements will not overlap, we introduce a spatial attention mechanism in GAN, through the recurrent neural network learn the features of the layout image, improve the performance of the generator, and make the generated layout higher quality. For verification, our proposed model is compared with the model without the spatial attention mechanism. Note that we do not compare the loss function because the biggest advantage of adding the attention mechanism is to make the boundaries of the generated layout elements more regular, so we use the quantitative evaluation of the generated layout to propose a metric: overlap (value approx. as small as possible).

Table 2
The results of the overlap index of the two models

Model	$L_{\textit{over}}$
Layout-noA	0.183
Baseline	0.191
Ours	0.042

Fig. 8.

Illustration of Eq. (12).

Overlap calculation formula:

$\displaystyle L_{\textit{over}}=\mathop{\sum}\limits_{i=1}^{N}\mathop{\sum}% \limits_{\forall j\neq i}\frac{S_{i}\mathop{\cap}\nolimits S_{j}}{S_{i}\mathop% {\cup}\nolimits S_{j}}$ (12)

Where, $S_{i}\mathop{\cap}\nolimits S_{j}$ represents the overlapping area between elements $i$ and $j$ , $S_{i}\mathop{\cup}\nolimits S_{j}$ represents the area where elements $i$ and $j$ are combined, and N is the number of layout elements. (e.g., in order to better explain Eq. (12), we simplified the layout to have two elements with a large overlap range, as shown in Fig. 8, where $S_{1}\mathop{\cap}\nolimits S_{2}$ is the part of the graph where the two elements overlap (green dashed box), and $S_{1}\mathop{\cup}\nolimits S_{2}$ represents the sum of the areas of the two regions).

Fig. 9.

The layout generated by our model, Layout-noA model and Baseline model is not optimized and the optimized rendering. The above is not optimized, and the bottom is optimized.

Layout-noA indicates that our model does not add spatial attention mechanism to the model.

It can be seen from Table 2 and Fig. 9 that after adding the attention mechanism, the overlapping area of each element in the layout is reduced, and the generated layout is more regular. Moreover, the Baseline model should generate three yellow areas in the middle of the layout, but only two yellow areas in the generated layout. This indicates that our mixer can generate layouts that better meet the constraint conditions. Due to the inferior performance of the baseline model generation compared to Layout-noA, we will directly compare it with Layout-noA. The Baseline model generation does not meet the constraints and does not meet the conditions when pasting text and images in the final effect display.

4.3 Evaluation index

Note that perceived layout quality is prone to subjectivity when it comes to judging good or bad layouts. Therefore, it is difficult to define a metric for judging, so we use four metrics to measure the quality of the generated layouts: Fréchet inception distance (FID) [29], structure similarity index measure (SSIM) [30], alignment [24] and user evaluation.

4.4 Quantitative results

FID: We use this distance to measure the similarity between the real and generated images. If the FID value is smaller, the similarity is higher. The best case is FID $=$ 0, the two images are the same.

SSIM: The value range of SSIM is [0, 1]. The larger the value, the smaller the generated layout distortion and the more similar to the real layout structure.

$\displaystyle\mu_{x}=\frac{1}{M}\mathop{\sum}\limits_{i=1}^{M}x_{i}$ (13) $\displaystyle\sigma_{x}=\left({\frac{1}{M-1}\mathop{\sum}\limits_{i=1}^{M}({x_% {i}-\mu_{x}})^{2}}\right)^{\frac{1}{2}}$ (14) $\displaystyle\sigma_{xy}=\left({\frac{1}{M-1}\mathop{\sum}\limits_{i=1}^{M}({x% _{i}-\mu_{x}})({y_{i}-\mu_{y}})}\right)^{\frac{1}{2}}$ (15) $\displaystyle\textit{SSIM}({x,y})=\frac{({2\mu_{x}\mu_{y}+C_{1}})({2\sigma_{xy% }+C_{2}})}{({\mu_{x}^{2}+\mu_{y}^{2}+C_{1}})({\sigma_{x}^{2}+\sigma_{y}^{2}+C_% {2}})}$ (16)

Where $x$ and $y$ represent the generated layout and the real layout, respectively, $\mu_{x}$ and $\mu_{y}$ are mean values, $\sigma_{x}$ and $\sigma_{y}$ are variances, $\sigma_{xy}$ is variance, and $C_{1}$ and $C_{2}$ are constants. M is the number of pixels in the picture, $x_{i}$ is the value corresponding to the pixel.

Alignment: Two adjacent elements in a layout usually have six alignment types: left alignment, X center alignment, right alignment, top alignment, Y center alignment, and bottom alignment. Respectively use $\varphi=({x^{L},y^{T},x^{C},y^{C},x^{R},y^{B}})$ to represent the coordinate points of the upper left corner, center point, and lower right corner of the element.

$\displaystyle\textit{Alig}=\mathop{\sum}\limits_{i=1}^{N}\min\left({\left({{% \begin{array}[]{*{20}c}{g({\Delta x_{i}^{L}}),g({\Delta x_{i}^{C}}),g({\Delta x% _{i}^{R}})}\\ {g({\Delta y_{i}^{T}}),g({\Delta y_{i}^{C}}),g({\Delta y_{i}^{B}})}\\ \end{array}}}\right)}\right)$ (17) $\displaystyle\Delta x_{i}^{\ast}({\ast=L,C,R})=\mathop{\min}\limits_{\forall j% \neq i}|{x_{i}^{\ast}-x_{j}^{\ast}}|$ (18) $\displaystyle\Delta y_{i}^{\ast}({\ast=T,C,B})=\mathop{\min}\limits_{\forall j% \neq i}|{y_{i}^{\ast}-y_{j}^{\ast}}|$ (19)

where N is the set of adjacent elements, $g(x)=\left\{{{\begin{array}[]{ll}{-\log({1-x})}&{x<1}\\ {\log(x)}&{x>1}\\ \end{array}}}\right.$ .

Table 3

Evaluation results of different models

Model	FID	SSIM	Alig
GRIDS	94.14	0.83	0.78
Layout-noA	83.16	0.82	0.73
GT	–	–	0.59
Ours	67.53	0.87	0.66

Fig. 10.

Three groups of layout comparison chart.

Figure 10 shows our results and compares them with GRIDS and true layout (GT). Our layout is generated with user-input constraints, while the GRIDS model is generated using our generated layout element size data as input conditions. GT selects the layout that matches our user input constraints in the real layout. The green area of the layout generated by GRIDS represents the title, the magenta area represents the text, and the gray area represents the picture.

It can be seen from Table 3 that our generated layout has the lowest FID value, which is closer to GT data, and the highest SSIM value indicates that it is more similar to GT data. Because GT layout calculations for FID and SSIM are meaningless, this article uses “–” instead. Alig represents the alignment index of each element in the layout. It can be seen that the alignment index of the layout generated by our model is higher than that of GT data but lower than the layout generated by other models.

4.5 Qualitative results

As shown in Fig. 11, the input column is the constraint conditions for user input, while the layout on the left is the output of our model. We can see that the distribution of layout elements generated by our model will be more uniform, more coordinated, and more beautiful. For example, the layout generated by the first row of GRIDS is generally upward, leaving a large area of blank space below the layout. On the contrary, the layout generated by us leaves white space at the top.

User evaluation: We use our model to generate 50 magazine ad layouts, use the GRIDS model to generate 50 magazine ad layouts, and then use the relative layout to design travel magazine ads, and 50 travel magazine ads in the test set. Since our evaluation here focuses on layout, we do not use the original travel magazines for comparison in the interests of fairness. Because the original travel magazine advertisement may contain some font decoration, such as the title may use word art, which may affect the volunteers’ scoring of the overall design. We asked two design graduate students and two non-design graduate students to score the 150 magazine advertisements generated above.

Fig. 11.

The layout is generated according to the constraint text input by the user. Based on the generated layout, we design travel magazine advertisements with travel advertisement text and landscape images to better visualize the quality of our model. To make the comparison fair, we design each type of layout using the same text and images as a travel magazine ad.

Fig. 12.

(a) is the scoring results of the three groups of magazines by graduate students who are not majors in design; (b) is the scoring results of the three groups of magazines by graduate students majoring in design.

Fig. 13.

(a) is the mean, and standard deviation of the scores of the three groups of magazines by graduate students who are not majors in design; (b) is the mean, and standard deviation of the scores of the three groups of magazines by graduate students of design majors.

To analyze the effect of layout applied to travel magazine advertisements, we invited four volunteers, and asked volunteers to rate each travel magazine poster (1 point means very bad, 5 points means very good), and finally counted the scores of design majors and non-design majors (Fig. 12: (a, b)). (a) Fig. 13 represents the mean, and standard deviation of the posters scored by non-design graduate students for each model. It can be seen that the posters with 4 and 5 points generated by our model account for the majority, while the GRIDS model generates 3 posters with scores of 1 and 4 in the majority. (b) represents the mean, and standard deviation of the poster scores for each model by design graduate students. The larger the mean, the higher the average poster quality, and the smaller the variance and standard deviation, the more stable the poster quality.

5. Conclusion and discussion

In this work, we propose GAN models to tackle the problems of constrained text generation of magazine layouts and constrained graph layout generation. We enhance the performance of both the discriminator and the generator so that the generator generates layouts that are more constrained, produce layouts of higher quality, and are able to produce layouts that are visually appealing and follow constraints. Through a large number of quantitative and qualitative experiments to prove the effectiveness of the model, we also designed a travel magazine advertisement for display based on the generated layout, and compared it with other models and real data. However, the design of magazine advertisements still has a long way to go. Magazine advertisements also need to consider color matching, harmony, fonts, etc., so there is still a lot of work to be done in the future for the automatic generation of magazine advertisements.

Footnotes

Acknowledgments

This research was supported by the “Pioneer” and “Leading Goose” R&D Program of Zhejiang under Grant 2023C01231.

References

Voloshyna

and Marach

, Advertising in tourism and leisure, Мова, культура та освіта: Тези доповідей та повідомлень науково-практичної конференції викладачів і студентів ВНАУ, 2015.

Weng

Huang

and Bao

, A model of tourism advertising effects, Tourism Management 85 (2021), 104278.

Binkhorst

and Den Dekker

, Agenda for co-creation tourism experience research, in: Marketing of Tourism Experiences, ed: Routledge, 2013, pp. 219–235.

Koffka

, Principles of Gestalt psychology: Routledge, 2013.

Karras

Laine

and Aila

, A style-based generator architecture for generative adversarial networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4401–4410.

Lee

H.-Y.

Jiang

Essa

P.B.

Gong

Yang

M.-H.

et al., Neural design network: Graphic layout generation with constraints, in: European Conference on Computer Vision, 2020, pp. 491–506.

Yaoyuneyong

Foster

Johnson

and Johnson

, Augmented reality marketing: Consumer preferences and attitudes toward hypermedia print ads, Journal of Interactive Advertising 16 (2016), 16–30.

Barnard

, Graphic design as communication: Routledge, 2013.

Zheng

Qiao

Cao

and Lau

R.W.

, Content-aware generative modeling of graphic design layouts, ACM Transactions on Graphics (TOG) 38 (2019), 1–15.

10.

Kovacs

O’Donovan

Bala

and Hertzmann

, Context-aware asset search for graphic design, IEEE Transactions on Visualization and Computer Graphics 25 (2018), 2419–2429.

11.

Ren

Lee

and Brehmer

, Charticulator: Interactive construction of bespoke chart layouts, IEEE Transactions on Visualization and Computer Graphics 25 (2018), 789–799.

12.

Ren

Huang

Wang

and Yang

, GazeGrid: A Novel Interaction Method Based on Gaze Estimation, in: 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), 2021, pp. 1–5.

13.

Hosobe

, A modular geometric constraint solver for user interface applications, in: Proceedings of the 14th Annual ACM Symposium on User Interface Software and Technology, 2001, pp. 91–100.

14.

Lutteroth

Strandh

and Weber

, Domain specific high-level constraints for user interface layout, Constraints 13 (2008), 307–342.

15.

Hosobe

, A hierarchical method for solving soft nonlinear constraints, Procedia Computer Science 62 (2015), 378–384.

16.

Dayama

N.R.

Todi

Saarelainen

and Oulasvirta

, Grids: Interactive layout design with integer programming, in: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 2020, pp. 1–13.

17.

Hasselknippe

K.F.

and Li

, A novel tool for automatic gui layout testing, in: 2017 24th Asia-Pacific Software Engineering Conference (APSEC), 2017, pp. 695–700.

18.

Damera-Venkata

Bento

and O’Brien-Strain

, Probabilistic document model for automated document composition, in Proceedings of the 11th ACM Symposium on Document Engineering, 2011, pp. 3–12.

19.

Yang

Hertzmann

Zhang

and Xu

, Layoutgan: Generating graphic layouts with wireframe discriminators, arXiv preprint arXiv:1901.06767, 2019.

20.

Yang

Mei

Y.-Q.

Rui

and Li

, Automatic generation of visual-textual presentation layout, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 12 (2016), 1–22.

21.

Biswas

Riba

Lladós

and Pal

, Docsynth: a layout guided approach for controllable document image synthesis, in: International Conference on Document Analysis and Recognition, 2021, pp. 555–568.

22.

Guo

Jin

Sun

Shi

et al., Vinci: an intelligent graphic design system for generating advertising posters, in: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, 2021, pp. 1–17.

23.

Zhang

and Wang

, Harmonious textual layout generation over natural images via deep aesthetics learning, IEEE Transactions on Multimedia, 2021.

24.

Yang

Zhang

Liu

Wang

and Xu

, Attribute-conditioned layout gan for automatic graphic design, IEEE Transactions on Visualization and Computer Graphics 27 (2020), 4039–4048.

25.

Arroyo

D.M.

Postels

and Tombari

, Variational transformer networks for layout generation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 13642–13652.

26.

Kikuchi

Simo-Serra

Otani

and Yamaguchi

, Constrained graphic layout generation via latent optimization, in: Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 88–96.

27.

Zhou

Jiang

and Xu

, Composition-aware Graphic Layout GAN for Visual-textual Presentation Designs, arXiv preprint arXiv:2205.00303, 2022.

28.

Goodfellow

Pouget-Abadie

Mirza

Warde-Farley

Ozair

et al., Generative adversarial networks, Communications of the ACM 63 (2020), 139–144.

29.

Nunn

E.J.

Khadivi

and Samavi

, Compound Frechet Inception Distance for Quality Assessment of GAN Created Images, arXiv preprint arXiv:2106.08575, 2021.

30.

Hassan

and Bhagvati

, Structural similarity measure for color images, International Journal of Computer Applications 43 (2012), 7–12.

Generate custom travel magazine layouts

Abstract

Keywords

1. Introduction

3. Our model

4.1 Data set

Table 1 Constraint text template

Table 2 The results of the overlap index of the two models

4.4 Quantitative results

Footnotes

Acknowledgments

References

Table 1
Constraint text template

Table 2
The results of the overlap index of the two models