Automatic detecting anthropometric landmarks based on spin image

Abstract

In this paper, we present a method to detect anthropometric landmarks on 3D human scans automatically. Our method is a template-based approach, which builds correspondences between instance models and a template model. Spin images are used to construct the similarity measure, which represents the shape features parametrically. A vertex on the instance model will be regarded as the landmark when its similarity measure against a corresponding landmark on the template model is of maximum value. Normal variation and bounding boxes are employed to speed up the comparison of spin images. The algorithm is verified on scanned human bodies with various shapes. Forty-seven landmarks are detected on the instance body. The results show that most landmarks are found accurately. Furthermore, we demonstrate the implementation of using our detected landmarks in automatic body measurement.

Keywords

Anthropometric landmark automatic detection spin image normal variation bounding boxes

With the development of 3D scanning technology, 3D surface range scanned data shows its advantage in many aspects for building more accurate digital models of a real person. 3D surface data of the human body has a lot of applications such as anthropometric studies, medical diagnosis, garment design, computer animation and entertainment. The high resolution surface data which provides accurate details of the body shape makes a remarkable change, especially in the field of anthropometry and garment design. However, the process of automatic landmark positioning on a scanned human body with various shapes is still a challenging problem. In this paper, we present a detection method to automatically extract the landmarks from the surface data by computing the corresponding information between instance models and a template model.

Previous work

Topics related to feature point detection are intended to fully benefit from 3D body scanning. Dekker et al.¹ made the exploration of feature detection operators as the first step to reconstruct the body surface. Zhong et al.² defined target zones to find out feature points for body segmentation. Both methods required the branching points such as armpits and crotch to be located for further treatment. The extraction of tailor measurements from a model for the lower body, consisting of two stacks of ellipses, was described by Certain et al.³ Methods of body feature extraction in the application of MTM (made-to-measure) were also proposed by other groups.^4–6 Ju et al.⁷ presented an algorithm to segment the 3D human body scans by analyzing the cross-sections of the scanned body, especially the circumferences of the horizontal slices. However, the number of detected feature points was limited and less accurate. A more conventional way that computed the surface curvature in the regions of interest was introduced by Suikerbuik et al.⁸ The model surface was segmented into different landmark regions based on surface curvature, by Subburaj et al.,⁹ and used to identify landmarks on bone models. The result was not accurate when the local surface curvature of some required features was not remarkable.

Allen et al.¹⁰ proposed a template-based approach to fit a template human model to instances of human scans by establishing correspondences between them. All the resulting models had the same number of triangles and point-to-point correspondences. A set of anthropometric landmarks which were provided in the CAESAR (Civilian American and European Surface Anthropometry Resource) data-base¹¹ were pre-marked manually on the template model. After solving the correspondence problem by deforming a template model to fit individual scans, the required features were determined. This template-based approach has no limits on the number of the required landmarks. However, the manually pre-marking process, which requires 76 landmarks for each individual in the CAESAR data-base, increases the time of measurement (from a few seconds to more than 30 minutes).

Anguelov et al.¹² also proposed a template-based method to register non-rigid surfaces by using a Markov network. This algorithm was applied on 3D human scans¹³ by embedding an instance mesh into a template mesh. About 200 corresponding points between the template model and instance model were selected. Therefore anthropometric landmarks of the instance model were identified when the pre-identification process was fulfilled. This algorithm was extended, by Azouz et al.^14,15, to solve the location of a landmark more efficiently. A pair-wise network was regarded as an instance of a probabilistic graphical model. Each node of the model was a random variable corresponding to the position of a landmark. The edges of the network represented correlations between the positions of landmark pairs. The probability associated with alandmark encoded a preference of the local surfaceproperty around this landmark and maintained the spatial relationship between this landmark and other neighboring landmarks in the network. To characterize the local surface feature, they introduced a representative technique called spin image.¹⁶ Spin image is traditionally used to describe the local surface feature by compressing the global 3D coordinates to2D local coordinates embedded in an image. Therefore the problem of establishing the correspondences is simplified to the problem of computing thesimilarities of a stack of 2D images. Starck et al.¹⁷ formulated the matching problem as a bijection in the2D spherical domain to guarantee a continuous one-to-one surface correspondence without over-folding. Ghosh et al.¹⁸ used an ICP-like framework to detect feature correspondences iteratively by using spin image to match a template with a collection of possible target meshes.

In this paper, we propose a method to detect an unlimited number of anthropometric landmarks on a scanned human body. The algorithm we applied is based on template fitting. Illuminated by the work of Ghosh et al.,¹⁸ we use spin image and Euclidian distance to establish correspondences for automatic landmark detection. However, different from their whole mesh registration, our method focuses on a simpler and more effective way to detect landmarks automatically. We also use these detected landmarks to achieve more subsequent implementations such asbody segmentation and skeleton extraction for body measurement.

Computing spin images

We formulate the problem of detecting corresponding landmarks as the problem of finding the best matches to those identified landmarks on a template mesh withsimilar shape features. In another word, it is the similarity of the feature that characterizes the local surface properties and the spatial relationship of the landmarks. In the following section we explain how the similarity is computed.

Spin images

We use spin images to characterize the feature of local surface. The concept of a spin image was first introduced by Johnson¹⁶ who used it to solve the surface matching problem. A spin image is a two-dimensional histogram computed at an oriented point of a surface mesh. Two cylindrical coordinates can be defined with respect to the 3D position and the surface normal of the oriented point. The dense points cloud on the mesh is projected on this 2D image. By accumulating 2D points in discrete bins, spin images are generated. Darker pixels in the image indicate those particular bins contain more projected points. Figure 1 shows the coordinate system used to compute the spin image at an oriented point. The radial coordinate, α, is defined as the perpendicular distance to the line through the surface normal, and the elevation coordinate, β, is defined as the signed perpendicular distance to the target plane defined by vertex normal and its position.

Figure 1.

Cylindrical coordinates system.

As a representation of shape feature, spin image is generated in an object-centered coordinate system, which means the description of the surface is view-independent. Hence the surfaces can be directly compared without pre-alignment. In practice, the 3D positions of vertices in the mesh are mapped into 2D using the spin map for each oriented point basis. By accumulating 2D points in discrete bins, spin image with appropriate bin size can be generated accordingly. Here, ‘bin size’ refers to the number of bins contained in each image. It is determined according to the mesh resolution of the body scan.¹⁶ As shown in Figure 2, to facilitate our explanation, the generated spin image has 30 × 30 bin size per image. Projecting the dense points cloud to limited bins of 2D images actually compresses the abundant surface data. This will facilitate the computation when comparing the similarity. Furthermore, it can reduce the effect of poor surface data on the mesh such as clutter (extra data) and occlusion (missing data).

Figure 2.

Spin images for an oriented point on the surface of a human model. Left to right: a) An oriented vertex of the human model; b) The 2D points map in terms of the base vertex; c) The 2D spin image.

Figure 3 shows examples of spin images generated on two different human scans. For each scan we choose five landmarks for instance. Notice that the spin images corresponding to the same landmark from different human scans are intrinsically similar with each other.

Figure 3.

Spin images computed for five landmarks on two different human scans.

Matching algorithm

To set up the landmark correspondence between instance model and template model, the matched pair should have the similar shape feature and spatial position. With these two criteria, we formulate the similarity measure, S, as:

(1)

In equation 1, the first part measures the similarity of the surface feature. The ranged space of weight, β, is valued between 0 and 1. The second part calculates if the selected point is close enough to the corresponding landmark, where d stands for the Euclidean distance. The selection of β depends on the distinctiveness of the local surface feature, which will be discussed in detail later. C indicates the similarity of the generated spin images. It combines the linear correlation coefficient R and its variance affected by the amount of overlapped pixels between spin images:¹⁹

(2)

where N is the number of overlapped pixels used in thecomputation of R. λ weights the variance against the expected value of the correlation. In practice, we define λ as:

(3)

where n is the number of spin images generated from the template scan, and N_i is the number of pixels in ith spin image.

Pre-marking

We choose a normal size, hole-free male body as our template mesh for the explanation. The posture of the subject before whole body scanning is demanded as standing with arms spread slightly apart from torso, as illustrated in Figure 3. In this paper, 29 landmarks that provide the most useful anthropometric data forthe clothing related applications are selected. To reduce the objective error in the collection of anthropometric data, we use the unified description of the definition introduced by Simmons et al.²⁰ Furthermore, we pick four extra landmarks from the template model to assist the automatic detection. They are points on fingertips and heels. As shown in Figure 4, the red spheres are the main landmarks and the blue spheres are the assistant feature points. The anthropometric nomenclature of these landmarks is given in Table 1.

Figure 4.

Positions of 29 pre-defined anthropometric landmarks.

Table 1.

The anthropometric nomenclature of landmarks

1 Crown	16 Navel
2 Right neck	17 Right trochanterion
3 Left neck	18 Left trochanterion
4 Suprasternal	19 Right buttock
5 Cervicale	20 Left buttock
6 Right acromion	21 Right carpus
7 Left acromion	22 Left carpus
8 Right bustpoint	23 Crotch
9 Left bustpoint	24 Right patella
10 Right armpit	25 Left patella
11 Left armpit	26 Right gastrocnemius
12 Right olecrannon	27 Left gastrocnemius
13 Left olecranon	28 Right malleolus
14 Right waist	29 Left malleolus
15 Left waist

After manual pre-marking, spin images with proper parameters generated from these points are stored for future comparison. The optimal selection of parameters for spin images is determined by the average length of the mesh edges.¹⁶ In practice, the resolution of the models is around 7000–8000 points. We find that choosing 100 × 100 bin size to generate spin images seems to be appropriate in shape comparison.

Detecting landmarks

Since spin images are generated according to the oriented points, the features of the landmarks can be categorized by the variation of the surface normal. In this sense, 29 landmarks on the surface of a template scan are divided into three types that represent different shape characteristics.

Geometrically, most of the landmarks have a palpable geometric feature and vary only in size and spatial location. These anatomical landmarks on each single body segment are usually constrained by a spatial relationship that is the same for each model. We therefore propose an additional characteristic for narrowing thescope of landmark identification: invariable relative position of a landmark with respect to other known landmark. This can be achieved via bounding boxes built according to the key landmarks.

Extraction of landmark regions

The equation we apply to evaluate the variation of the surface normal is defined as:

(4)

where m indicates the number of neighbors for the ithvertex, n_i is the normal of the ith vertex and n_j is the normal of its jth neighbor. The neighborhood is defined by satisfying two requirements: (a) the Euclidean distance d between these two vertices mustbe less than 20 cm; (b) the variation of the surface normal between these two vertices must be less than90°.

After computing the variation of the surface normal for each vertex on the template scan, three viewpoint-independent surface types can be formed accordingly. Different types refer to a different valueof β in computing the similarity measure S when we match the corresponding landmarks between the template model and the instance model. Figure 5 illustrates the normal variation on each vertex mapped with color values on the template surface. The definition of three types of surface, the categories of 29 landmarks and the selection of weight β in equation 1 are given in Table 2.

Figure 5.

Variation map of surface normal on the template scan.

Table 2.

Different landmark types based on the variation of surface normal

Normal variation	v_i >65°	55°< v_i ≤65°	v_i ≤55°
Type	A	B	C
Landmarks (The reference numbers are given in Table 1.)	1, 2, 3, 6, 7, 10, 11, 23	4, 12, 13, 14, 15, 17, 18, 24, 25	5, 8, 9, 16, 19, 20, 21, 22, 26, 27, 28, 29
Value of β	0.7	0.5	0.3

The categorized normal variation implies that thelandmarks on the template model will fall into the same types of landmark region on the instance model. This actually builds the regional correspondence between two models. The searching of landmarks on the instance model will then be carried on the same type of region.

Detecting key landmarks

The key landmarks which are classified into type A areoriented points on the body surface with uniqueness in shape characteristics in their vicinity. From our observation, the key landmarks can always be detected precisely. Hence they are detected on the instance scans primarily.

The similarity measure S in equation 1 gives us a way of evaluating spin images. Two spin images with the highest similarity measure are likely to come from two vertices with the same surface feature. In the detecting process, the instance model is calibrated to the same height as the template model, and then it is restored with the same calibration parameters in a reverse manner after landmark detection. For a known landmark on the template model, the spin image will be compared with the spin images extracted from the same type of landmark region on the instance model. The detailed steps follow:

Select a key landmark q_i from the template model, and load its spin image Ω_qi;

For type A vertex p_j from the instance model, if the Euclidean distance d between q_i and p_j is less than 20 cm, and the angle between oriented vertex q_i and p_j is less than 90°, load its spin image Ω_pj;

Use equation 1 to compute the similarity measure S;

The type A vertex from the instance model with maximum S will be regarded as the matched key landmarks.

In practice, eight key landmarks can be easily extracted with these four steps of treatment, as shown in Figure 6 (the red spheres).

Figure 6.

Five bounding boxes based on the detected key landmarks.

Bounding boxes

After detecting the key landmarks, we use them to build bounding boxes for further landmark detection. The bounding boxes are defined in terms of the body parts, i.e., torso, left leg, right leg, left arm and right arm respectively. Technically, they are employed to calculate a more accurate relative position when we compute with the similarity measure.

To build these five bounding boxes that enclose fiveparts of human body, we have to detect four extra key landmarks to define the end of each branch. They are the vertices of the left fingertip, right fingertip, left heel and right heel (blue spheres in Figure 4). From the variation map of the surface normal (Figure 5), wecan classify them into type A landmarks. With these 12 key landmarks, five bounding boxes are built for the template and instance models respectively. From our observation, the thickness of the box in thez-axis has little effect on the constraint of spatial relationship. Hence, we use a uniform thickness which determined by the maximum and minimum values at the z-axis of all the vertices on the mesh for all the boxes. The corresponding landmarks which determinethe maximum and minimum values at each axis for five bounding boxes are listed in Table 3. The subscript number indicates the index of each landmark, asdescribed in Table 1. Figure 6 shows an example of building these five bounding boxes based on the detected key landmarks.

Table 3.

Bounding boxes based on corresponding landmarks

Box	X_max	X_min	Y_max	Y_min	Z_max	Z_min
Torso	x₆	x₇	(y₂+y₃)/2	y₂₃	z_max	z_min
Left arm	x_{left finger tip}	x₁₁	y₇	y_{left finger tip}	z_max	z_min
Right arm	x₁₀	x_{right finger tip}	y₆	y_{right finger tip}	z_max	z_min
Left leg	x₇	x₂₃	y₂₃	y_{left heel}	z_max	z_min
Right leg	x₂₃	x₆	y₂₃	y_{right heel}	z_max	z_min

To facilitate our explanation, we use the right arm as an example, as shown in Figure 7. The bounding box for the right arm is built based on three key landmarks, i.e., right finger tip, right acromion and right armpit. Inthe detecting process, the position of the landmark p on the template model can be mapped onto the instance model at the position p″ by maintaining the same coordinates in the bounding boxes of the template and instance model:

(5)

where X_{min_I}, X_{max_I}, Y_{min_I}, Y_{max_I}, Z_{min_I} and Z_{max_I} are the minimum and maximum value of the bounding box for instance model, and X_{min_I}, X_{max_I}, Y_{min_I}, Y_{max_I}, Z_{min_I} and Z_{max_I} are the minimum and maximum value of the bounding box for the template model.

Figure 7.

Using the bounding box of the right arm to detect the landmark on the right wrist.

Although p″may or may not be a vertex on the instance model, we always use it as the benchmark for similarity comparison. In this sense, the spin image stored for landmark p should also be transferred to point p″. Now let p_i be the candidate vertex on the instance model, the searching is still limited in the region where the Euclidean distance d between p_i and p″ is less than 20 cm, and the angle between oriented vertex p_i and p″ is less than 90°. By comparing the similarity measure S, we can detect all the landmarks that are classified into type B and C in Table 2.

Experimental results and discussions

We applied our algorithm to over 200 human scans including both male (117) and female (112) subjects in a wide variety of human body shapes. Figure 8 and Figure 9 illustrates the girth values and height value of the subjects measured in terms of the landmarks detected by our proposed method. Obviously, the measures of these body shapes are quite different. This means our proposed method can satisfy a wide variation of body shapes.

Figure 8.

Shape differences among female subjects.

Figure 9.

Shape differences among male subjects.

To demonstrate the result of our proposed method, several typical body shapes including obese shape (fromleft to right, the 5th male body in the first tworows of Figure 10) are illustrated, as shown in Figure 10.

Figure 10.

Examples of our experiment with 47 detected landmarks on each scan. The first model at each row is the template model for landmark detection, in front view and back view respectively.

The 29 landmarks are represented as red spheres, and there are 18 extra landmarks represented as blue spheres. For these extra landmarks, four of them are the aforementioned extra key landmarks to define the end of each branch (left fingertip, right fingertip, left heel and right heel). The other 14 extra landmarks are employed for precise body segmentation (Figure 11) in automatic body measurement. Technically, it is wise to segment the human body into torso and limbs prior to circumference and/or distance measurement. This makes the automatic body measurement easier and more accurate than taking the whole body as the subject, since each segmented branch is a cylinder-like object. For instance, if we want to measure the waist girth, a cutting plane passing through the most concave position on torso will not take the left and right arm into account. Figure 11 demonstrates the automatic body segmentation and measurement based on the landmarks detected by our proposed method. The skeleton for the instance model can also be extracted by setting the joints as the center of the concerned girth. The detailed algorithm of body measurement and skeleton extraction can also be found in our previous investigation.²

Figure 11.

Automatic body segmentation and measurement.

In computer animation, the skeleton is usually employed as the benchmark to drive the skin in various motions, usually equipped with the motion captured data. In this type of application, the pre-marked landmarks for a motion captured subject (template) can beeasily transferred onto an instance model. Hence themotion can be transferred accordingly, and can beemployed in 3D virtual dressing system to provide more realistic visual effects in animation such as runwaywalk.

The Euclidean distances between the predicted landmark positions and their corresponding position placed manually according to their anthropometry definitions are calculated to verify the accuracy of our algorithm. Figure 12 shows the average error for each landmark.

Figure 12.

Average errors of 29 auto-detected landmarks against manually pointed landmarks.

Obviously, for all of the 29 landmarks, the maximum error is less than 1.4 cm, which coincides with the human experience when performing the landmarking and measurement manually.²¹

Aside from anthropometric landmark positioning, our proposed method can be employed in a more flexible manner in determining the customized landmarks. For instance, in 3D bra design or comfort evaluation, it is important to locate the breast area on the 3D human body. Figure 13 illustrates examples of breast zone detected for different female body shapes. 19 landmarks around the target area were pre-defined on the first template model.

Figure 13.

Results of breast zone detection for female scans based on our algorithm. Left to right: (a) template model with 19 customized landmarks; (b, c, d) automatically detected landmarks on different female scans.

To evaluate the performance of our proposed method, we implemented our algorithm on a PC with Core Dual CPUs at 2.8 GHz, and 1 GB Memory. The running time for the detection of all 47 landmarks (including the extra landmarks used for skeleton extraction and body segmentation) on an instance scan is usually under one half minute. The scans we picked have around 7000–8000 points. We also apply our algorithm for detecting scans with different mesh resolution. Results demonstrate that the time spent per-landmark depends on the number of points we searched in the region of interest. It can be controlled by using different searching parameters such as searching radius and permissible variation of normal. Since the dimension of spin images (bin size) is set to approximately the mesh resolution,¹⁶ landmarks detection using spin images is sensitive to mesh quality. We find that the higher the density of points, the higher the accuracy of the result and the higher the running time of the algorithm. The running times for the detection of all 47 landmarks on scans with different mesh resolution are shown in Figure 14.

Figure 14.

Running times for detection runs on scans with different mesh resolution.

Although most of the landmarks can be detected accurately, the landmarks located on non-significant regions may have less accuracy such as the navel (thefourth scan in the first row in Figure 10). Such inaccuracy of detection is caused by the fact that those landmarks are not clearly identifiable points with surface feature descriptor and it might be caused by poor surface reconstruction in body scanning. Another observed limitation is that once the pre-marked landmark offsets from its anthropometric position, the detected landmark on the instance model cannot overcome this historic error, which commonly exists in template-based approaches.

Conclusion

In this paper, we present a method for automatically detecting anthropometric landmarks on 3D scanned models. To describe the local surface feature, spinimages at each oriented point are adopted. Thelandmark detection is performed by comparing the similarity measure defined upon the spin images between known landmarks on the template model and the candidate vertex on the instance model. Normal variation and bounding boxes are employed to speed up the comparison. Experiment results show accurate landmark detection can be achieved in a short time by using our proposed method. In addition, the body segmentation and skeleton extraction as two common applications can be easily carried by using these detected landmarks.

Footnotes

Funding

This work was supported by the Natural Science Foundation of China (grant number 60973072) and the Fundamental Research Funds for the Central Universities (grant number 2011D10102).

References

Dekker

Douros

Buxton

Treleaven

. Building symbolic information for 3D human body modeling from range data. Second International Conference on 3-D Imaging and Modeling (3DIM '99), 3dim, pp.0388, 1999.

Zhong

. Automatic segmenting and measurement on scanned human body. Int J Cloth Sci Technol 2006; 18(1): 19–30.

Certain

Stuetzle

. Automatic body measurement for mass customization of garments. Second International Conference on 3-D Imaging and Modeling (3DIM '99), 3dim, pp. 0405, 1999.

Huang

Chen

. Body scanning and modeling for custom fit garments. J Textile Apparel Technol Management 2002; 2: 1–10.

Pargas

Staples

Davis

. Automatic measurement extraction for apparel from a three-dimensional body scan. Optics Lasers Eng 1997; 28: 157–172.

Leong

Fang

Tsai

. Automatic body feature extraction from a marker-less scanned human body. Computer-Aided Design 2007; 39: 568–582.

Werghi

Siebert

. Automatic segmentation of 3D human body scans 2000. Computer Graphics and Imaging 2000 CGIM 2000.. 239–244. Nov 19–23.

Suikerbuik

Tangelder

. Automatic feature detection in 3D human body scans. Proceedings of SAE Digital Human Modeling Conference, 04-DHM-52.

Subburaj

Ravi

Agarwal

. Automated identification of anatomical landmarks on 3D bone models reconstructed from CT scan images. Computerized Medical Imaging Graphics 2009; 33: 359–368.

10.

Allen

Curless

Popovic

. The space of human body shapes: Reconstruction and parameterizations from range scans. ACM Trans Graphics 2003; 22(3): 587–594.

11.

Robinette

Daanen

Paquet

. CAESAR project: a 3-D surface anthropometry survey. Second International Conference on 3-D Digital Imaging and Modeling (3DIM' 99), pp.380–386, 1999.

12.

Anguelov

Srinivasan

Koller

Thrun

Pang

Daves

. The correlated correspondence algorithm for unsupervised registration of non-rigid surface. Adv Neural Inf Proc Systems 2005; 17: 33–40.

13.

Anguelov

Srinivassan

Koller

Thrun

Rodgers

Davis

. SCAPE: Shape completion and animation of people. ACM Trans Graphics 2005; 24(3): 408–416.

14.

Azouz

Shu

Mantel

. Automatic locating of anthropometric landmarks on 3D human models In: Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT’06), 3dpvt, 2006, pp. 750–757.

15.

Wuhrer

Azouz

Shul

. Semi-automatic prediction of landmarks on human models in varying poses. In: 2010 Canadian Conference on Computer and Robot Vision, 2010, pp. 136–142.

16.

Johnson

Hebert

. Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Trans Pattern Analysis Machine Intelligence 1999; 21(5): 433–449.

17.

Starck

Hilton

. Spherical matching for temporalcorrespondence of non-rigid surfaces. In: International Conference on Computer Vision – ICCV, 2005, pp. 1387–1394.

18.

Ghosh

Sharf

Amenta

. Feature-driven deformation for dense correspondence. Medical Imaging 2009:Visualization Image-Guided Procedures and Modeling, 2009.

19.

Johnson

Hebert

. Surface matching for object recognition in complex three-dimensional scene. Image Vision Computing 1998; 16: 635–651.

20.

Simmons

Istook

. Body measurement techniques. J Fashion Market Manage 2003; 7(3): 306–332.

21.

Gordon

Bradtmiller

Clausen

Churchill

McConville

Tebbets

. Anthropometric survey of US Army personnel. Methods & summary statistics. Technical Report TR-89-044. Natick, MA: US Army Natick Research Development and Engineering Center, 1989.