Abstract
Traditional sketch-based 3D model retrieval methods are content-based, which return the search results by ranking the geometric similarities among a free-hand drawing and 3D model candidates. These conventional methods do not consider personal drawing characteristics and styles (abbreviated as styles), which are obvious and important in user’s sketch queries. An ordinary user presumably is not a professional and skillful artist. Therefore, users are likely to introduce personal drawing style in sketching 3D model rather than faithfully render the model according to its geometric perspectives. For amateurs, such personal styles are unintentionally introduced due to their limited sketching capabilities. As determined by a person’s sketching habit, personal drawing styles are largely personally consistent and stable. Ignoring such non-trivial personal styles while attempting to reconstruct intended models according to their sketch inputs does not usually produce satisfactory outcomes, in particular, for amateur sketchers. To overcome this problem, we propose a novel style-sensitive 3D model retrieval method based on three-view user sketch inputs. The new method models users’ personal sketching styles and constructs joint tensor factorization to improve the retrieval performance.
Introduction
With the rapid development of 3D modeling and printing technologies as well as the Internet technology, 3D models have become widely accessible. Following text and images, the information overload problem occurs again in 3D models. Despite the predominant role of text-based search methods in today’s information retrieval world, text searches cannot effectively retrieve 3D models, the reason is that these models do not apply to keyword-based descriptions since it is not always appropriate or feasible to describe geometric features of a 3D model using natural languages, letting alone queries composed of a few keywords. When it comes to describing a 3D model, instead of keyword-based search technologies, sketching is considered as one of the most intuitive and convenient methods to describe a 3D model [1–5]. Therefore, how to design a sketch-based 3D model retrieval method has greatly drawn researcher’s attention.
Reconstructing a 3D model according to a single view sketch drawing is not easy. A sketch drawing often reflects a user’s abstract perception of a 3D model from one or multiple views. Because the majority of web users are not professional sketch illustrators, the distortion of sketches is inevitable. As a consequence, the outline structure of the 3D model can not be accurately obtained through affine transformation or other techniques. Therefore, the sketch-based 3D model retrieval is always a rather challenging work.
A bulk of traditional sketch-based 3D model retrieval methods [6–9] are only based on the content similarity between the user sketch and two-dimensional projection of 3D model. These are content-based methods. However, the user sketching styles are not taken into consideration to improve the performance of the 3D model retrieval. To solve the distortion problem of sketches, we propose a sketching-style-based 3D Model retrieval method.
Related works
In usual, the sketches generated by the unprofessional painters, who have not received long-time training, may be distorted and even incorrect. Therefore, both the geometric and semantic gaps between these user-generated sketches and 3D models always exist. Yet early research mainly match the content features of the sketches with that of the 3D model projections without specific consideration on the personalized deviation of the sketches. Image processing and pattern recognition technologies develop very fast [10–14, 42–46]. As in the image retrieval area, there are many outline-sketch-based retrieval engines, such as the followings. Hirata et al. [15] tried to retrieve images by comparing the extracted edge images of the users’ input sketches with 205 pieces of edge extraction images of color images in the database. These images were normalized on scale and were divided into partial blocks, for each of which the best local correlation was calculated. Finally, the global similarity was computed through the cumulative local correlation values.
Rajendran et al. [16] conducted image retrieval in a database containing nearly 5000 pieces of color images. The strongest edges of the images were encoded by using direction and curvature. This method used multiple scales to describe different levels of details of the input sketches. Loffler et al. [1] developed the affine transformation invariant descriptor for single contour. In the graph retrieval area, Funkhouser et al. [2] described the method which was based on graph projection view matching method. In the pre-processing stage, boundary contours for every model were extracted from different directions, of which each was represented by global boundary rotation invariant feature descriptor. Then the best match was obtained by comparing the feature description of certain view with the feature description of the sketch input of the corresponding view. The image retrieval technology based on sample presented by Chen et al. [3] also supported sketch input. In this method, the light field descriptor was obtained through the intensive sampling in the view direction, but the descriptor could only describe closed contour curve.
Due to the unsatisfactory performance of the sketch-interaction-based retrieval methods, a batch of related studies have emerged recently. In 2012, Eitz et al. [9] adopted bag-of-words and Gabor filter to realize three-dimensional model retrieval based on single sketch. Eitz et al. [17] also developed a feature package sketch representation method and used multi-class support vector machine to train the sketch dataset. They obtained 56% accuracy for test sketches and achieved an effective interactive sketch recognition system. Xu et al. [18] presented the 3D model co-retrieval based on the sketches of 3D models. Sun et al. [19, 20] proposed a sketch-based retrieval system which could effectively retrieve among hundreds of millions of images by using comprehensive information such as sketches and key words, sketches and color. Sun et al. [21, 22] applied the query-adaptive shape topic model to the recognition of sketches and their system collected web-scale clipart images. Zhou et al. [23, 24] proposed a new sketch-based image retrieval algorithm. In their retrieval algorithm, the first one was the weighting center of the feature of the images which ensured object in the image could be retrieved while ignoring the size and the location; the second one was region of interest which was prominent in the image. It helped to retrieve similar images of the target from the complicated scene. By setting the two regions as candidate regions, the retrieval speed was accelerated. Besides the region methods above, the authors extracted direction feature and used a hierarchical way to describe global-local features. Yoon et al. [4] illustrated the 3D model retrieval method which used the diffusion tensor fields for descriptor calculation. Contour-based image search engine MindFinder [25] was a sketch-based retrieval system which could realize real-time retrieval in large scale database independent of keywords. Daras et al. [26] described a unified framework, which supported hand-drawn-sketch-based 3D model retrieval and sample-based 3D model retrieval. From each model, 32 views were extracted and then three 2D rotation invariant shape descriptors were computed for each view. But the performance of the hand-drawn-sketch-based retrieval was not quantitatively evaluated. Cao et al. [27] described the sketch-based retrieval system which could realize real-time retrieval in large scale database. Chen et al. [28] presented Sketch2Photo system, which could composite an image according to retrieval results from text annotations and corresponding sketches.
Sketching is people’s favourite kind of human-computer interaction, so it is taken seriously by many researchers. The sketch-based visual information retrieval is of great importance. All above are useful for sketch-based retrieval. Many researchers have studied the sketch-based 3D model retrieval systems. However, the majority of them are only based on the content, and they do not analyze the sketching styles of the users and the user behavior data to improve the retrieval performance. In order to increase retrieval performance, we will consider sketching styles which are not considered by the existing works.
Sketching-style-based 3D model retrieval
Different from the traditional sketch-based 3D model retrieval systems which do not consider the user sketching style, the user-sketching-style-based sketch retrieval method which we proposed will improve the retrieval performance by considering the sketching styles of the users.
What is the sketching style of the users?
Sketching style is an abstract concept. It is related to the personal habit and drawing techniques and is hard to express with limited words. Therefore we do not make the artificial definition for it. Sketching styles can be learned from the sketch data of the users. Specifically, we could cluster the 3D models. And the models similar in shape would be clustered into one cluster. Then we would collect the sketch data of all the users for certain cluster of 3D models. And those sketches would be clustered into N clusters. These clusters are the users’ styles which are automatically learned but not artificially defined. The sketching style number N is not manly preseted but is learnt from the specific sketch data by clustering. Quantitative description of the sketch styles greatly enhances users’ confidence in the retrieval system.
Quadripartite graph and tripartite graph
When the user styles are obtained, it is easy to find the associations in users-views-styles-3D models which can be represented by a quadripartite graph as shown in Fig. 1. This can be transformed into the fourth order tensor. User, views, styles and 3D models are four modes of the tensor. We can call them as mode 1, mode 2, mode 3 and mode 4 respectively.
Each user could draw her/his sketches from various views. However, the three views (front, side, top) are more in line with the cognitive rule of our human because they seamlessly describe the shape of the 3D model from three orthogonal directions. So we just select the three orthogonal views (front, side and top), which means that the mode 1(users) and mode 2 (views) are compressed into one mode. Therefore, the quadripartite graph in Fig. 1 can be reduced to a tripartite graph by merging mode1 and mode 2 into one mode, as shown in Fig. 2.
Joint tensor factorization
The tripartite graph in Fig. 2 can be transformed into the third order tensor. Considering there are many associations in the tripartite graph (i.e. three order tensor) as shown in Fig. 3: Any two of the three dimensions are associated with each other, and there are similarity between different 3D models and between different sketching styles. So we use a joint tensor factorization algorithm to deal with various associations.
Firstly, we give an introduction of the physical meaning of each data in the third order tensor, as shown in Fig. 4. The data in the third order tensor of Fig. 4 can be divided into three parts: Users. Here, it means the users who have registered and logged in the sketch-based 3D retrieval system. If the number of users is too large, it could be reduced by clustering and selecting the representative users out. The number of user clusters is not artificially defined but determined by cluster algorithm. Sketching styles. The sketching styles will be learned from the Sketch data with the following methods. Firstly, the 3D models should be clustered. The similar 3D models will be clustered into one cluster and one representative sample would be elected as the cluster center for each cluster. The number of 3D models in each cluster is not artificially defined but learned from the data. Secondly, there are many kinds of three-view sketching styles for each kind of 3D model, and too fine granularity would lead to high time complexity. Therefore, three-view sketches of all the users for one 3D model cluster will be clustered into N sketching styles, the number of which is also not artificially defined but learned from the sketches of all users. 3D models. We will use the PSB [29] dataset which includes a large number of 3D models stored in the format of off. The two-dimensional outlines of the three views of these models will be extracted in our pre-processing step. For 3D models, some of them are greatly similar in their sketch outlines. It calls for too fine processing granularity and increases time complexity. To solve this problem, we could cluster similar 3D models into one cluster, such as clustering eight-legged spiders into one cluster, paving the way for the joint tensor factorization in the following.
For the user clusters, we only consider the user similarity based on the user sketching styles. Based on the user behavior data, the user similarity could be obtained, which is useful for user cluster. The method is as follows. A user-user sketching style similarity statistical model could be established. As shown in Fig. 5, a “user-user-3D model” tensor would be constructed. We fill the similarity of sketches of the same one(cluster) 3D model(s) of any two users into the third-order tensor space as its elements. This model could be seen as a three-dimensional space model. The coordinates of the first and second dimensions are the user ID. And the coordinates of the third dimension is the 3D model (cluster) ID. The values of the elements in the three-dimension space means the similarity of the sketches of different users to the same 3D model (cluster). For any two different users, if there are many elements where their values are close to 1, then it means the two users are greatly similar to each other on style. To describe this similarity statistically, we could compress this 3D statistical model into a user-user two-dimensional model by mathematical methods and attain the user-user sketching style similarity matrix. The horizontal and vertical coordinates of the matrix are user ID and the elements in the matrix stand for the similarity of the sketching style of any two users statistically. There must be missing elements in the original 3D matrix. For the 3D model sketches which two users drew, the more similar sketches are, the more similar the users’sketching style is regarded to be. That is to say, the more the high similarity elements of the two users there are, the larger the user sketching style similarity will be. If some 3D models are too simple, it would increase the similarity of the sketches, so certain punishment should be exposed to it by reducing its weight coefficient. On the basis of the above method, users can be clustered to M user clusters, where M could be close to the magnitude of the 3D model clusters. This means the representative users would be retained.
The tripartite graph shown in Fig. 2 can exactly be expressed as the third-order user-style-model tensor as shown in Fig. 4. The tensor is sparse and there are many missing elements in it. It can be completed by tensor factorization. The advantage of this method is that on the basis of giving full consideration to the latent overall association of data in all dimensions of the tensor space, it could reasonably predict the missing data. In the tensor factorization, it reserves the overall physical relationship of the data of each dimension in the tensor space. Here, we will construct a joint tensor factorization for 3D model retrieval. Traditional tensor factorization models [30] include TD model (Tucker Decomposition model), and CP model (CANDECOMP/PARAFAC, also called PARAFAC model). Steffen et al. [31, 32] proposed pairwise interaction tensor factorization (PITF) model, but they did not take the problem of joint tensor factorization into consideration. The work of [33–35] introduced the matrix factorization methods but did not involve tensor factorization. The work of [36] introduced a non-negative tensor factorization algorithm but did not involve joint tensor factorization. The above methods did not discuss joint tensor factorization which was used for 3D model retrieval. The work of [37] used the PARAFAC tensor decomposition for collaborative recommendation based on GPS data. The development of big data is quite fast now, so the study of tensor factorization and its optimization methods is meaningful.
We use a joint tensor factorization method to predict missing tensor elements. The objective function is shown as the following.
The above equation can be optimized by the gradient descent method, which is an efficient optimization method. The primary update steps are described as follows.
We define the following equations.
Then we can take the following partial derivatives.
In this algorithm, we can obtain the update equations as in the following.
In the above equations, P is the tensor, Q1 is the 3D model similarity matrix of different clusters of 3D models, and Q2 is the sketching style similarity matrix. U is user-latent-feature matrix, V is the 3D model-latent-feature matrix and C is style-latent-feature matrix. The first part of the objective function is the pure third-order tensor factorization loss function. Our aim is to predict reasonable values for the missing elements in the third-order user- style- 3D model tensor, minimizing the losses. The second part of the objective function is to match the physical meaning of the decomposed feature matrix V with the 3D model content similarity matrix Q1. The similarity of any two 3D model clusters could be computed in the following way: Extract a representative 3D model of each 3D model cluster. Then the three views of a 3D model will be extracted and the feature of the three views will be attained. After the computing, the similarity of different 3D model clusters are obtained. So the 3D model- 3D model content similarity matrix Q1 would be attained, as shown in Fig. 6. And the loss function attained from this matrix is the second part of the objective function. The third part of the objective function is to match the physical meaning of the component of the feature matrix C with the sketching style similarity matrix Q2 as shown in Fig. 7. Users have preferences on styles. That is to say, users of the same style are probably to behave similarly. Therefore, we could obtain the style-style similarity matrix Q2 and then the loss function obtained from this matrix would be a part of the objective function. The rest parts of the objective function are the regularization terms used to alleviate the overfitting. We use the Algorithm 1 to optimize the objective function of joint tensor factorization.
The joint tensor factorization
The main steps of the algorithm are illustrated as the following. |D A | is the dataset.
Based on the joint tensor factorization mentioned above, we can obtain the new retrieval method as shown in Fig. 8. Here, we fuse the methods of joint tensor factorization and content-based method to obtain the new retrieval method. The new retrieval method can use the user drawing styles and user behavioral data to improve the performance of the 3D model retrieval. Here, the content-based method could utilize the Gabor filter-based method [39] to obtain the feature descriptors. It could be changed to other methods [40, 41] too. Figure 8 describes new retrieval method. After the user has drawn certain style of three views, the 3D models he/she mostly wants to retrieve would be predicted with joint tensor factorization. A final rank would be obtained by fusing the two methods of the joint tensor factorization and content-based method.
We propose a sketch-based 3D model retrieval method based on three-view sketches and sketching styles which are learned from the uses’ feedback data and can improve retrieval performance. According to our observation, there are different degrees of deviation in sketches. Our retrieval method does not require that users’ sketches have rather high precision and we permit deviation of sketches within a controllable scope.
Footnotes
Acknowledgments
This work is supported by the National Natural Science Foundation of China under Grants Nos. 61232011, 61303001, 61572531 and 61320106008. This work is supported by Special Program for Applied Research on Super Computation of the NSFC-Guangdong Joint Fund (the second phase). This work is supported by the National Supercomputer Center in GuangZhou. This work is also supported by the Science and Technology Project of Guangdong Province under Grant No. 2015A010103010.
