Abstract
Virtual try-on synthesizes garments for the target bodies in 2D/3D domains. Even though existing virtual try-on methods focus on redressing garments, the virtual try-on hair, shoes and wearable accessories are still under-reached. In this paper, we present the first general method for virtual try-ons that is fully automatic and suitable for many items including garments, hair, shoes, watches, necklaces, hats, and so on. Starting with the pre-defined wearable items on a reference human body model, an automatic method is proposed to deform the reference body mesh to fit a target body for obtaining dense triangle correspondences. Then, an improved fit metric is used to represent the interaction between wearable items and the body. For the next step, with the help of triangle correspondences and the fit metric, the wearable items can be fast and efficiently inferred by the shape and posture of the targeted body. Extensive experimental results show that, besides automation and efficiency, the proposed method can be easily extended to implement the dynamic try-on by applying rigging and importing motion capture data, being able to handle both tight and loose garments, and even multi-layer clothing.
E-commerce and mobile commerce have become part of our lives, with an increasing number of consumers preferring to buy clothing online. According to a report from China Internet Network Information Center (http://www.cnnic.net.cn), clothing, shoes and hats are among the top five categories of products that people prefer to shop online. Despite the large online transaction figures, it is still challenging to sell clothes online due to the high return rate. Research has found that the average online apparel retailer experiences a return rate of 28%, and 80% of them come from fit issues. 1 The value of fit information for the online retailer is further proved by Gallino et al. 2 Traditional pre-defined pattern sizes, such as S, M, L, XL and XXL, can only limitedly alleviate this problem; moreover, the size variations across different apparel brands are usually significant.
Virtual fitting room technologies have been introduced to simulate how a product fits an individual customer.3–6 However, many of the existing methods require that the targeted human body has a similar pose to the reference body mesh.7–9 In addition, user-specified feature points, skeletal information or segmentation are necessary.10–13 Some works claimed to be automatic, which is true only if the skeleton and skin weight are given.12,14 Inspired by the movements of human beings, researchers put a skeleton consisting of several virtual joints under the surface of a human body and bound each vertex of the body surface onto several nearest joints. By linearly regressing the rotation of the binding joints based on skin weight, the vertex position will be updated in real time. However, even if several works15,16 have moved toward automatically rigging the body mesh, the results of these approaches are not always reliable.
The existing methods of garment redressing can be roughly classified into two categories, according to the existence of a reference body mesh. In the first category, only a garment mesh is given without a reference body model, and the problem can be described as follows: taking a 3D garment mesh and a target body mesh as the input, and outputting a dressed target body model. Li et al. 17 allow users to manually select feature points and then use curvature and torsion to match the selected feature points. This is not an automatic method and cannot handle the human meshes of various postures. Also, Huang et al. 18 proposed automatically aligning 3D garment mesh with a target body model through feature points calculation and posture alteration, but this method can only work for the zygomorphic garment mesh, which highly limits its use in many practical applications. To address this problem, Tisserand et al. 14 and Wu et al. 19 proposed new error metrics to fit the rigged body into the garment mesh by finding the optimal posture. However, they can work well only if the skeleton is perfectly generated—usually prepared manually. Duan et al. 20 proposed an automatic method by segmenting the garment mesh into patches according to the feature lines, and then sewing them onto the target body model via geometric processing. This method overcomes the dependence of the skeleton, but it can only be used for the A-pose target body mesh. Moreover, due to the lack of a reference body, the clothing fit information is missing.
The second category assumes the existence of a reference body, and the problem can be stated by transferring a 3D garment mesh fitted on the reference body model to another target body mesh. An additional key aspect, compared with the methods without a reference body, is to preserve the clothing fit during garment transfer. Brouet et al. 10 proposed a method that allowed preserving the clothing grade during redressing garments—useful for clothing customization—but their method requires pre-rigged reference and target bodies in similar poses. Zhong et al. 21 presented a method by duplicating the postures, where the skeletons of the reference body and the target body are calculated from the anthropometric features and the posture error is compensated by an affine transformation. However, this method has to assume that the target body has a similar standard posture, such as A-pose. Lee et al. 12 segmented the reference body mesh into six segments and assigned garment vertices to one of these parts, resulting in a pose-independent fitting. This method does not depend on the skeleton, but it cannot handle long and wide clothing, such as dresses, due to the incorrect segmentation. Jiang et al. 13 deformed the target body to fit the reference body in order to obtain an initial redressing state, and starting from this point, collision handling and physically based simulation are applied to resolve the penetration problems while the fitted target body is being reverted to its original posture. However, this method requires user adjustment for the accurate skeleton.
Compared to virtual try-on garments, virtual try-on shoes, hair, eyeglasses, hats and other wearable accessories are under-reached. Chou et al. 22 proposed the first image-based virtual try-on shoes method. Yang et al. 23 used an RGBD camera to track the foot and align the shoes with the foot. However, they are not designed for redressed other wearable items.
In this paper, a fully automatic method for dressing wearable items to bodies of various shapes and postures is proposed. More specifically, we aim to transfer wearable items from a reference body to another target body without any user intervention. To this end, an optimization-based method to deform the reference body mesh to fit the target body mesh is introduced. After that, inspired by the idea of skinning techniques which bind mesh vertices onto the skeleton, we represent the clothing fit as the local relationship between each vertex of the garment and its nearest K triangles. Such a simple but efficient representation can preserve the clothing fit and ensure a fast garment transfer even for high-resolution, multi-layer garments. To refine a more realistic geometry, physically based simulation can be applied. Finally, the proposed method is extended to mesh sequences for dynamic virtual try-ons.
Methodology
In this section the proposed method of this paper is introduced. A mesh model Overview of the proposed method: (a) given reference garments G fitted on a reference body model B; (b) input the target body model S; (c) novel clothing fit representation that accounts for the interaction between garments and the body; (d) deform the body model B to fit S via the proposed automatic approach; (e) a simple yet efficient method is proposed to redress garments 
Body fitting
To redress the garment, we start with deforming B to S to obtain
Correspondences
Garment fitting is mostly a problem of shape registration. As a first step, in order to select the correspondences, it is necessary to perform shape registration.24,25 Many works have explored methods to find the accurate sparse feature points26,27 or dense correspondences in 3D space.28–30 In this study, we follow this trend and define the surface metric based on correspondence positions. We apply the state-of-the-art approach called 3D-CODED proposed by Groueix et al.
29
to find dense correspondences. 3D-CODED takes two body meshes as inputs, and deforms a pre-defined body template mesh to fit both inputs, from which the dense correspondences are obtained. The method overview of 3D-CODED is illustrated in Figure 2(a). Two scanned bodies, Method overview of 3D-CODED: (a) high-solution template; (b) and (c) source body and the target body respectively; (d) and (e) deformed templates for (b) and (c); (e) dense correspondences between (b) and (c), which are visualized by drawing lines between correspondences.
Objective function
We resolve the shape and posture registration simultaneously by solving an objective function, which is similar to the one defined by Sumner et al. 31 Compared to Sumner et al., 31 we have two main improvements: (a) we make the method to be fully automatic; and (b) we avoid updating the correspondences by finding the nearest neighbors.
Intuitively, the distance between
The distance between a deformed point
An additional smoothness term is used to regularize the deformation. We penalize the difference of transformation of neighboring triangles under the Frobenius norm
The third contributor to our objective function is the identity term, which indicates that all transformations should be equal to the identity matrix. This term helps to prevent a drastic mesh deformation in order to achieve optimal smoothness. The identity term is thus defined as
The full objective function is a weighted sum of the above three terms given by
Vertex formulation
In equation (4), there are three unknown variables, that is
Specifically, for a triangle with vertices
Given a rotation matrix
It can be noted that t can be removed by subtracting the first equation from the others
Naturally, r is rewritten as
As
Similarly, equation (2) is rewritten as
Based on the above reformulations, the complete objective function (equation (4)) is minimized by setting its derivative to zero and solving a linear system.
32
Once the minimization problem is fully solved, the deformed source body mesh
Redressing wearable items
Our goal is to redress wearable items from B to S. As shown in Figure 1, in order to transfer G so that it fits onto S, we first find a mapping from B to S, which guides the desired deformation of G. To find such a mapping, B is deformed to fit S as introduced in the previous sections. Hence, the deformed body,
Fit metric
Clothing fit, as the most important aspect in virtual try-on, should be preserved during garment redressing.6,10 We assume that the garments created are all fitted to the reference body mesh. In the work of Hu et al.,
6
a metric called 3D garment vector field is proposed to quantitatively represent the dynamic clothing fit by attaching each vertex on the garment to its closest triangle on the body mesh. We propose a similar fit representation where the main difference is that we attach each garment point to K(K > 1) nearest triangles rather than one triangle, as shown in Figure 3. When computing the K nearest triangles, KD-tree is applied to find the closest centroids of the triangles.
Fit representation: (a) our method; (b) method from Hu et al.
6

Redressing
As introduced in the Fit metric section, each vertex of G is attached to K nearest triangles of B. As B and
The redressing algorithm can usually generate a plausible garment mesh onto the target body. For better realistic geometry of garments, physically based simulation can be further applied based on our redressed garments, as shown in Figure 1(f).
Dynamic virtual try-on
Compared to the static fitted garment, dynamic virtual try-on is similar to the real-world try-on. The common real-time solution is to bind garment vertices to the skeleton, which is similar to body model rigging. Each vertex of the garment can copy the skinning weights from the nearest vertex on the body. However, manual intervention is necessary to ensure good quality. The common offline solution is to compute the geometry of clothing according to the movement of the body model by means of cloth simulation. However, it is not easy to obtain the optimal simulation parameters such as stiffness and friction, and a bad set of these parameters will result in a softer cloth simulation. By contrast to these methods, we extend our garment redressing method to obtain dynamic virtual try-ons.
Rigging and motion
Many works about mesh animation have been reported.33,34 In this study, linear blender skinning (LBS) 15 technology is applied to animate the target body due to its cheap computation and good performance. We first generate a skeleton and compute the skinning weights using the method proposed by Feng et al. 16 By controlling the angles of joints, we can implement varying body posture. In order to have a realistic animation, the motion of a real person is usually recorded, and then the motion is retargeted to a virtual character for simulating the same motion. The technology of recording motion is called motion capture. 35
Clothing animation
We bind garments on the body surface via our fit presentation. To a sequence of body mesh, the garments are fitted onto the body of the first frame. All frames of body models have the same vertex order, so we apply the proposed garment transfer algorithm to dress all frames. In this processing, only linear operation is involved, which implements real-time dynamic virtual try-on.
Results and discussion
To evaluate the proposed approach, a fashion expert was hired to model garments, hats, shoes, necklaces and watches fitted on the referenced models via Houdini (https://www.sidefx.com/products/houdini/). Human models from the FAUST (Fine Alignment Using Scan Texture) 36 datasets and SMPL-based (Skinned Multi-Person Linear Model) 37 data, as well as other resources, are used to test various combinations for fitting.
Results
Same posture but different bodies
As shown in Figure 4, clothes with three layers are prepared on the reference body (Figure 4(d)). Target bodies in A-poses have different shapes, and our method can successfully redress multi-layer garments onto bodies in various shapes. The surface of redressed clothes is smooth, and penetration issue can be avoided by the fit metric in our approach.
Redressing multi-layer garments onto bodies with the same posture: (a) a reference body; (b), (c) and (d) are each layer of cloth fitted on the reference body; (e), (h) and (k) are bodies with the same pose; and the rest are the redressing results using our method.
Same body but different postures
Transferring garments to the same body in various poses is illustrated in Figure 5. It can be seen that the geometry of the redressed garment is inferred by the shape and posture of the target body. The garments on the reference body (Figure 5 center) are smooth, but realistic details (more wrinkles) can be synthesized by our method.
Redressing garments onto the same body in varying postures.
Different postures and bodies
Previous methods usually require the reference to be in a fixed posture, for example A-pose. In Figure 6, two reference bodies with different shapes and poses wear two kinds of clothes (Figure 6(a)). These garments are transferred to another two target bodies with different shapes and poses respectively. This experiment shows that our method is functioning when the target/reference bodies have different bodies and poses.
Redressing garments onto different posed bodies from a non-standard reference body: (a) reference garments and bodies with non-standard poses; (b) target bodies; (c) and (d) are the redressed garments onto (b).
Other wearable items
In another example, the hair, eyeglasses, watch and shoes are transferred from a male reference body to another three target bodies (two males and one female); the hat, necklace and shoes are transferred from a female reference body to these target bodies, as shown in Figure 7. It illustrates that the proposed method is a generic method of transferring wearable items to the target bodies with various shapes and poses.
Results of wearable items including eyeglasses, watch, shoes, hat and necklace, or even for the hair.
Comparisons
We designed the comparisons with respect to the following: body fitting, binding number, garment redressing and clothing animation.
Body fitting
Our method needs to deform the reference body to fit the target body for finding one-to-one triangle correspondences. For a fair comparison, we used the body template from Groueix et al.
29
as the reference body, and then deform it to fit the target bodies. A male scan and a female scan (Figure 8(a)) are selected from the FAUST dataset, and a deep learning-based method (Figure 8(b)) and an optimization-based method (Figure 8(c)) are used to compare with our method (Figure 8(d)). Our method integrates Groueix et al.’s
29
to find dense point correspondences between the reference body and the target body. It can be seen that the fitting result from Groueix et al.
29
is not accurate, as there are a lot of penetrations (Figure 8(b)). Sumner et al.’s
31
has a better performance than Groueix et al.’s,
29
but needs manually specified landmarks. To quantitatively compare the fitting error, the average closest point distance between the deformed body and target body is computed, which is called the chamfer distance.
38
Finally, the chamfer distance is defined as

Binding number
During the transfer of wearable items, we bind each vertex of the items onto its corresponding triangles of the body. In Figure 9, we compare the effect of the different binding number on the quality of redressed garments. When setting the binding number to one, the smoothness of the garment cannot be well-preserved (Figure 9(b)). When the binding number is increased to three (Figure 9(c)), the smoothness can be significantly improved. In our experiments, we find that setting the binding number to five usually yields better results (Figure 9(d)).
Redressed results using different binding numbers: (a) a reference body and reference garments; (b) results of binding one triangle; (c) results of binding three triangles; (d) results of binding five triangles.
Garment redressing
To validate the performance of the proposed method on real scanned data, we select the scanned body and scanned garments as references (from Duan et al.
20
), as shown in Figure 10(a); the garments are redressed onto the target scanned bodies shown in Figure 10(b) and taken from Duan et al.
20
We also compare our method with the latest work proposed by Duan et al.
20
In Figure 10(c), it can be seen that penetration exists using the method of Duan et al.,
20
and the garment scale cannot be well-preserved (notice the red box). It is obvious that the reference garments onto the reference body are loose (see Figure 10(a)), while the results from Duan et al.
20
shown in Figure 10(c) illustrate tight garments. In contrast, our method can efficiently avoid the penetration issue and preserve the loose geometry of the garments; the results of the proposed approach are shown in Figure 10(d).
Comparison of garment redressing: (a) reference body and reference garments; (b) target scanned bodies; (c) results from Duan et al.;
20
(d) results from our method.
Clothing animation
In order to evaluate the performance of the proposed method in dynamic virtual try-on, we compared our results with the popular real-time animation technology of LBS.
15
As shown in Figure 11(a), we automatically rig the body model using Maya quick rig tool (www.autodesk.com), and import the Kung Fu motion to generate the body sequence. The skin weights of the body can be easily copied to the garments by finding nearest neighbors, then a dressed character animation is generated by LBS (Figure 11(b)). The physically based simulation result from Houdini is shown in Figure 11(c) and our result is illustrated in Figure 11(d). Since physically based simulation is computationally expensive and time-consuming, we run the physically based simulation offline. Then, we treat the physically based result as the ground truth to compute the error for LBS and our results. Table 2 reports the numerical comparison for Figure 11; these results show that our method performs better. Table 3 shows the vertex number and triangle number of the testing garments. Figure 12 compares the local geometry of the garments. It can be seen that our method is better for generating a smoother and more realistic surface of the garments compared with LBS.
Comparison of clothing animation: (a) body mesh sequences; (b) results from linear blender skinning; (c) results from physically based simulation; (d) results from our method. Local comparison of garments: (a) results from LBS; (b) results from our method; (c) results from physically based simulation. Numeric comparisons of Figure 11 The physically based simulation result is treated as the ground truth, and the error is quantified by the chamfer distance (in millimeters). The maximum values are highlighted in bold. Number of vertices and triangles used in the testing garments

Conclusions and future work
We have presented a fully automatic method to redress 3D wearable items onto bodies with various shapes and poses. The proposed method can be useful to visualize the try-on appearance for customers and fast design a dressed character for computer animations and games. To our knowledge so far, it is the first generic virtual try-on method of wearable items. The experiments show that the method handles reference/target models with complex postures, and also handles multi-layer garments. On the strength of the fast computation, the proposed method can be extended to directly generate the dynamic virtual try-on. In summary, the main contributions are as follows:
We propose a novel fully automatic method to redress 3D wearable items including garments, shoes, eyeglasses, necklaces, hats, watches and even hair, from a reference body mesh to a target body mesh. To our best knowledge, this is the first generic method of wearable items virtual try-on. We proposed a novel pipeline of automatically deforming a template body mesh to fit to a target body mesh.
The extensive experiments show that the proposed method can be used to redress varying wearable items. The geometry of redressed items depends on the shape and posture of the target body. The main limitation of our current work is that it fails on dresses. This is because our fit representation cannot assign a vertex of garment to two legs, which results in cloth tear. Future work will address this issue.
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research has been supported by INNOVIRIS (project BRGRD24) under the project eTailor Explore 11b 2018 and Fonds Wetenschappelijk Onderzoek (FWO), Vlaanderen (project G084117).
