Abstract
Among the problems of specifying the style and number of elements of a travel magazine, the problem of generating magazine layout by constraining text, and constraining graph layout remains a complex and unsolved problem. In this paper, we generate layouts of text satisfying constraints via GAN. Due to the complexity and variety of graph designs, we enhance the performance of the discriminator and the generator so that the layouts generated by the generator are more constrained. Add non-corresponding constraint text and real layout pairs to the discriminator to enhance the performance of the discriminator; then add a spatial attention mechanism to the layout encoder to extract the features of the layout and generate high-quality layouts. We demonstrate that the proposed method can generate high-quality layouts of text satisfying the constraints, and we validate the effectiveness of this method through user ratings.
Introduction
Advertisements in travel magazines have gradually grown up along with the development of tourism and advertising in China. As we all know, the product that tourism operators sell to tourists is the travel experience, most of which are intangible products that cannot be seen, touched, or taken away [1]. How to effectively convey commodity information (tourism products and destination information) to such customers (tourists) is more difficult than ordinary commodities, so the development and prosperity of the tourism industry depend to a large extent on the support of advertisements [2]. The development practice of the tourism industry also fully proves that travel magazine advertisements are an important way to show tourists the image of a tourist destination and the charm of tourism products. In travel magazine advertisements, the layout is the basis of travel magazine advertisement design. The layout design of print ads is very important in creating multimedia, and it has a wide range of applications in advertising, books, and websites. A good design is visually pleasing while conveying effective information, but high-quality travel magazine ad design is highly dependent on experienced designers [3]. When designing a magazine, the designer often needs to create a layout that combines multiple elements, and sometimes needs to design a magazine advertisement with a similar style but a different layout. When designing a magazine advertisement, the designer usually first has the materials for designing the magazine advertisement (such as how many pictures can be used in total, how many text descriptions, etc.), and then designs the travel magazine according to the number of materials, the size of the pictures and the satisfactory design style. Advertising takes a lot of time. There are many possibilities for organizing a given set of elements into a layout. A good layout must meet design goals (such as reading order, alignment between elements, etc.), and changing one element may require reorganizing many others [4].
In this work, our objective is to address the problem of generating layouts for tourism magazine advertisements based on user input constraints, such as the number of available design materials or layout style preferences. As shown in Fig. 1, we aim to create high-quality tourism magazine advertisement layouts that satisfy the given constraints (This layout is a three-column composition, one red area, three yellow areas, one green area). Previous research has started to employ learning-based methods to tackle the task of magazine advertisement layout generation and achieved some success in generating natural realistic images [5]. However, this field has not been extensively explored. Existing approaches mainly rely on templates or heuristic rules for layout generation [6]. However, these existing methods have limitations in handling relationships between elements, such as interdependencies and alignment or overlapping between elements. Moreover, considering user style preferences and constraints, utilizing generative models to produce effective layouts also presents challenges. Therefore, further research and development are necessary to effectively capture element relationships and user constraints using learning methods to solve the problem of generating tourism magazine advertisement layouts.
Drawing layout generated according to user constraints and example design based on generated layout construction.
Therefore, this paper proposes a way to generate a customized layout for travel magazines. Design tourism magazine advertising layout based on user input constraints.
In summary, we make the following contributions to this work:
The spatial attention mechanism technology introduced in the GAN network, and the characteristics of the layout image are learned through the cyclic neural network, which improves the performance of the generator and makes the generated layout higher quality. Introduce a mixer in the GAN network, add a third type of input to the discriminator, consisting of mismatched text and real layout, improve the performance of the discriminator, and optimize the constraints between layout and text by learning relation.
Travel magazine advertisements are a combination of image and text superimposition, and present text information and image through certain typesetting [7]. Travel magazine advertisement is a kind of graphic design, which is usually used to show the scenery of scenic spots and convey information about scenic spots to tourists, to promote tourist attractions. Creating high-quality travel magazine ads can be difficult and time-consuming when designers are faced with a plethora of design choices. When providing image and text descriptions of specific attractions, designers usually need to explore a large number of layout styles, and then design travel magazine advertisements according to the layout styles, not to mention that sometimes designers need to produce a large number of magazine advertisements, so it is proposed to intelligently generate tourism the study of magazine advertisements can save designers a lot of time in layout design, and also allow non-design professionals to design high-quality, harmonious and beautiful travel magazine advertisements. The most important thing in the design of a travel magazine advertisement is the layout of each element (title, subtitle, image, text, etc.). The layout is at the heart of graphic design, including magazines, posters, comics, and web pages [8]. A high-quality layout can facilitate the presentation of the information while capturing the reader’s attention and enhancing the visual impact of a travel magazine ad. In recent years, the problem of graphic design layout has received increasing attention from the graphics community. Some previous work attempts to model graphic design layouts to generate layouts guided by style, perception, and aesthetics. Rich layout variations in graphic design are largely driven by the visual and textual content to be presented [9]. Automatic layout generation is a major research hotspot in the field of graphic design [10, 11]. In automatic layout generation, grid layout is used as a design principle in many toolkits and layout managers, which provide interactive aids such as grid-snapping and automatic alignment [12]. Hosobe et al.’s [13] address nonlinear geometric constraints such as Euclidean geometry, non-overlapping, and graph layout constraints, and they also discuss soft constraints with hierarchical dominance or preference. Lutteroth et al.’s [14] proposed an ALM layout model, a constraint-based technique for specifying 2-D layouts. Hosobe et al.’s [15] proposed a hierarchical method for solving soft nonlinear constraints, which uses hierarchical preferences to handle soft nonlinear strains and computes a solution that satisfies as many constraints as possible and strong preferences, which adopts the method of Lagrange multipliers can compute the local solution exactly. Dayama et al.’s [16] proposed an interactive integer programming layout design. By inputting the size of each element, a mixed integer linear programming method was used to make each element non-overlapping, thereby designing the layout of a magazine. A graphical user interface is designed using a grid layout, the spatial structure is defined by grid lines, and grid lines guide the size and position of GUI elements to simplify the layout design [17]. Grid-based layout design is mostly manual, and the computer just assists each module to be as beautiful as possible without overlapping. Damera-Venkata et al.’s [18] proposed to model the relationship between page elements, solving the document arrangement problem in Bayesian network probabilistic reasoning. In addition, an automatic layout design system is also popular in the automatic typesetting layout, such as a magazine generated by some preferences set by the user. Li J et al.’s [19] proposed Layout-GAN to synthesize layouts by modeling the geometric relationships of different types of 2D elements. Lee et al.’s [6] propose a design layout generation method that satisfies user-specified constraints. A visual text representation layout with predefined layout templates and aesthetic design principles is designed [20]. Biswas et al.’s [21] proposed a method for the automatic synthesis of document images based on a given layout. Guo et al.’s [22] designed the Vinci system to use a deep generative model to match a product image with a set of design elements and layouts to generate aesthetically pleasing posters. Li C et al.’s [23] proposed an efficient deep aesthetic learning method to generate harmonious text layouts on natural images. Li J et al.’s [24] introduce attribute-conditional layout GANs to incorporate attributes of design elements into graph layout generation by enforcing the generator and discriminator to satisfy attribute conditions. Arroyo D et al.’s [25] propose to exploit the properties of self-attention layers to capture high-level relationships between elements in a layout and use these as building blocks of the well-known variational autoencoder (VAE) formulation. Kikuchi K et al.’s [26] propose a generative layout model built on the transformer architecture and formulate layout generation as a constrained optimization problem, where constraints are designed for element alignment, overlap avoidance, or any other user-specified relationship. Zhou M et al.’s [27] proposed a deep generative model called composition-aware graphic layout GAN (CGL-GAN), which synthesizes layouts based on input images’ global and spatial visual content.
Our model
Since the ultimate goal of this paper is to determine the layout of travel magazine advertisements through the user input design style or the number of specific elements, the layout of magazine advertisements is generated through the model of Fig. 2. Each layout
The framework of our model.
The text data is processed, and each text is converted into a matrix in the form of one-hot encoding, and passed through an embedding and an RNN layer, as shown in Fig. 3.
Text encoding model diagram.
Text encoding is to learn text features from text content and use the learned text features to guide the generation of magazine ad layouts. The text label of each layout is composed of nine phrases, which are layout style, text proportion, picture proportion, title proportion, background proportion, title proportion on pictures, and text proportion on picture, it consists of nine pieces of information, including the total element quantity and the specific quantity of a single element. The constraint text of each travel magazine advertisement layout is converted into a matrix through one-hot, and then through the word embedding layer (embedding), the output of this layer is s, and finally through the recurrent neural network (RNN) layer, the output vector y. As shown in Fig. 3.
Where
Attention-based models have gained popularity in a variety of computer vision and machine learning tasks, including neural machine translation, image classification, image segmentation, image and video captioning, and visual question answering. Attention improves the performance of all these tasks by encouraging the model to focus on the most relevant parts of the input.
Layout coding model diagram.
In the layout encoder, the input is the real layout P, and the output is the feature vector X of the layout. The shape of each layout is (64
The input
Mix and shuffle constraint text and layout.
The input in the discriminator is the constraint text and the layout pair
The generation confrontation network is composed of a generator G and a discriminator D. The dual system of GAN is to let the generator try to confuse the discriminator and simultaneously judge the source of the input image as possible. The two models’ relationship between them is against each other, and they both make themselves better by trying to beat each other. The generator can get feedback on whether the image it generates is consistent with the image distribution of the dataset through the discriminator, and the discriminator can get more training samples through the generator. So
Goodfellow I et al.’s [28] proved that this minimax game has global optimality exactly when
In the generator
In discriminator
Equations (9) and (11) use the gradient step size to update the generator G and discriminator D network parameters. Equations (8) and (10) are the loss functions of the discriminator and the generator, respectively, where
The above is the layout generated by the generator, and the bottom is the optimized layout.
As we all know, the GAN network has difficulty in reconstructing fine vision, but this does not affect us too much, because we generate a layout structure rather than a clear enough photo, but the pictures generated by the GAN network still have noise points, the element boundary in the layout will not directly generate a standard rectangle, so we need to refine it, as shown in Fig. 6. The specific process is as follows: First, the internal noise points in each label after semantic segmentation are identified by color recognition technology, and then the internal noise points are filled with the color of the corresponding label. Next, remove boundary noise points for each label after removing internal noise points. Finally, correct the boundaries of each label after removing the boundary noise points, as follows: use the function to obtain the point set of the four boundaries of the color area, then find the average point of the four boundaries, and obtain the coordinate value of the vertical axis from the average points on the upper and lower sides, the horizontal axis coordinate value is obtained from the mean points on the left and right sides, and the obtained vertical axis coordinate value and horizontal axis coordinate value are combined to form four point coordinates as four vertices. The boundary of the rectangular area surrounded by the four vertices is the color of the boundaries of the region.
We use three different methods for comparison: GRIDS, Ground Truth (GT), Baseline. Baseline: We use a basic GAN network, only the discriminator and generator are not added, and the text encoder uses the same text encoder as us, while the layout encoder uses a single CNN.
Data set
Training the model in this paper requires the layout of real travel magazine advertisements, but there are only travel magazines, so semantic layout annotations are required for travel magazines.
Use the python-based scrappy crawler framework to create a travel magazine advertisement search project, collect 682 travel magazine advertisements from travel websites, filter out some magazine advertisements that do not meet the requirements and are vague, and the remaining 626. Split it into a training set (70%) and a test set (30%). Set up six types of labels: travel magazine ad text, travel magazine ad title, picture, title on the picture, text on the picture, and background, respectively represented by yellow, green, red, purple, blue, and gray areas. The layout effect is shown in Fig. 7 this article distinguishes heading elements from other text elements because heading elements play an important role in graphic design layout. Then, manually divide a part of the travel magazine advertisements obtained in step 1 into six types of labels as the training set, and train the fully convolutional neural network (FCN). Then, through the trained fully convolutional neural network, the remaining travel magazine advertisements are divided into six categories of labels through semantic segmentation. Due to the relatively small amount of data in the training set, data enhancement techniques (including random resize, random horizontal flip, and random cropping) are used to enhance the training set. Each label after semantic segmentation is refined, and the method is the same as the layout optimization in Section 3.5.
We use color recognition technology (such as Python’s OpenCV package) to identify the four vertex coordinate values of each color area. Through these four coordinate values, we can calculate the proportion of various labels in the tourism magazine advertising layout. We can also identify the number of color areas in each layout, and calculate which types of labels each layout is composed of, as well as the number of various labels. Through the investigation of the layout classification, it is found that the layout can be divided into seven types of composition, which are the circular composition, the palace-style composition (including the four-square grid composition, the six-square grid composition, and the nine-square grid composition), the left-right symmetrical composition, and the three-column composition, combined composition, split composition (where one tab occupies most of the layout), and two-column composition. Manually classify each layout according to seven layout styles. The semantic information of the layout is expressed through the nine sentences in Table 1 as the constraint text of the layout. Table 1 is a template for the constraint text.
Constraint text template
Constraint text template
When using the test set to evaluate the model, the data input is the nine constraints of the layout, and the output is the layout image. In user evaluation experiments, the constraints on user input are the style of the layout and the number of elements in the layout. The proportion of each element in the layout is our input when training the model.
Example of magazine layout.
To better capture the local information of the layout, such as the regularity of the boundaries of each element in the generated layout, the boundaries between the generated layout elements will not overlap, we introduce a spatial attention mechanism in GAN, through the recurrent neural network learn the features of the layout image, improve the performance of the generator, and make the generated layout higher quality. For verification, our proposed model is compared with the model without the spatial attention mechanism. Note that we do not compare the loss function because the biggest advantage of adding the attention mechanism is to make the boundaries of the generated layout elements more regular, so we use the quantitative evaluation of the generated layout to propose a metric: overlap (value approx. as small as possible).
The results of the overlap index of the two models
The results of the overlap index of the two models
Illustration of Eq. (12).
Overlap calculation formula:
Where,
The layout generated by our model, Layout-noA model and Baseline model is not optimized and the optimized rendering. The above is not optimized, and the bottom is optimized.
Layout-noA indicates that our model does not add spatial attention mechanism to the model.
It can be seen from Table 2 and Fig. 9 that after adding the attention mechanism, the overlapping area of each element in the layout is reduced, and the generated layout is more regular. Moreover, the Baseline model should generate three yellow areas in the middle of the layout, but only two yellow areas in the generated layout. This indicates that our mixer can generate layouts that better meet the constraint conditions. Due to the inferior performance of the baseline model generation compared to Layout-noA, we will directly compare it with Layout-noA. The Baseline model generation does not meet the constraints and does not meet the conditions when pasting text and images in the final effect display.
Note that perceived layout quality is prone to subjectivity when it comes to judging good or bad layouts. Therefore, it is difficult to define a metric for judging, so we use four metrics to measure the quality of the generated layouts: Fréchet inception distance (FID) [29], structure similarity index measure (SSIM) [30], alignment [24] and user evaluation.
Quantitative results
FID: We use this distance to measure the similarity between the real and generated images. If the FID value is smaller, the similarity is higher. The best case is FID
SSIM: The value range of SSIM is [0, 1]. The larger the value, the smaller the generated layout distortion and the more similar to the real layout structure.
Where
Alignment: Two adjacent elements in a layout usually have six alignment types: left alignment, X center alignment, right alignment, top alignment, Y center alignment, and bottom alignment. Respectively use
where N is the set of adjacent elements,
Evaluation results of different models
Three groups of layout comparison chart.
Figure 10 shows our results and compares them with GRIDS and true layout (GT). Our layout is generated with user-input constraints, while the GRIDS model is generated using our generated layout element size data as input conditions. GT selects the layout that matches our user input constraints in the real layout. The green area of the layout generated by GRIDS represents the title, the magenta area represents the text, and the gray area represents the picture.
It can be seen from Table 3 that our generated layout has the lowest FID value, which is closer to GT data, and the highest SSIM value indicates that it is more similar to GT data. Because GT layout calculations for FID and SSIM are meaningless, this article uses “–” instead. Alig represents the alignment index of each element in the layout. It can be seen that the alignment index of the layout generated by our model is higher than that of GT data but lower than the layout generated by other models.
As shown in Fig. 11, the input column is the constraint conditions for user input, while the layout on the left is the output of our model. We can see that the distribution of layout elements generated by our model will be more uniform, more coordinated, and more beautiful. For example, the layout generated by the first row of GRIDS is generally upward, leaving a large area of blank space below the layout. On the contrary, the layout generated by us leaves white space at the top.
User evaluation: We use our model to generate 50 magazine ad layouts, use the GRIDS model to generate 50 magazine ad layouts, and then use the relative layout to design travel magazine ads, and 50 travel magazine ads in the test set. Since our evaluation here focuses on layout, we do not use the original travel magazines for comparison in the interests of fairness. Because the original travel magazine advertisement may contain some font decoration, such as the title may use word art, which may affect the volunteers’ scoring of the overall design. We asked two design graduate students and two non-design graduate students to score the 150 magazine advertisements generated above.
The layout is generated according to the constraint text input by the user. Based on the generated layout, we design travel magazine advertisements with travel advertisement text and landscape images to better visualize the quality of our model. To make the comparison fair, we design each type of layout using the same text and images as a travel magazine ad.
(a) is the scoring results of the three groups of magazines by graduate students who are not majors in design; (b) is the scoring results of the three groups of magazines by graduate students majoring in design.
(a) is the mean, and standard deviation of the scores of the three groups of magazines by graduate students who are not majors in design; (b) is the mean, and standard deviation of the scores of the three groups of magazines by graduate students of design majors.
To analyze the effect of layout applied to travel magazine advertisements, we invited four volunteers, and asked volunteers to rate each travel magazine poster (1 point means very bad, 5 points means very good), and finally counted the scores of design majors and non-design majors (Fig. 12: (a, b)). (a) Fig. 13 represents the mean, and standard deviation of the posters scored by non-design graduate students for each model. It can be seen that the posters with 4 and 5 points generated by our model account for the majority, while the GRIDS model generates 3 posters with scores of 1 and 4 in the majority. (b) represents the mean, and standard deviation of the poster scores for each model by design graduate students. The larger the mean, the higher the average poster quality, and the smaller the variance and standard deviation, the more stable the poster quality.
In this work, we propose GAN models to tackle the problems of constrained text generation of magazine layouts and constrained graph layout generation. We enhance the performance of both the discriminator and the generator so that the generator generates layouts that are more constrained, produce layouts of higher quality, and are able to produce layouts that are visually appealing and follow constraints. Through a large number of quantitative and qualitative experiments to prove the effectiveness of the model, we also designed a travel magazine advertisement for display based on the generated layout, and compared it with other models and real data. However, the design of magazine advertisements still has a long way to go. Magazine advertisements also need to consider color matching, harmony, fonts, etc., so there is still a lot of work to be done in the future for the automatic generation of magazine advertisements.
Footnotes
Acknowledgments
This research was supported by the “Pioneer” and “Leading Goose” R&D Program of Zhejiang under Grant 2023C01231.
