Abstract
Visual recognition of 3D point cloud data of bridge inspection scenes is a key step in automating the visual inspection process, which is currently largely manual and inefficient. To alleviate the lack of large-scale annotated point cloud datasets for training such 3D visual recognition algorithms, this research investigates an approach for developing large-scale synthetic point cloud datasets. The proposed approach proceeds in four steps: (1) random generation of different types of bridges in computer graphics environments; (2) sampling of camera trajectories that represent data collection scenarios during bridge inspection; (3) 3D reconstruction using Structure from Motion (SfM) applied to rendered synthetic images; (4) automated annotation of the reconstructed point cloud using ground truth masks obtained with synthetic images. Besides, this research proposes to store point uncertainty information defined by the error between the ground truth depth and the depth calculated from the SfM results. Prior to training, thresholds can be applied to this uncertainty information to control the levels of outliers in the dataset. This research demonstrates the proposed approach by generating point cloud datasets for two data collection scenarios. The effectiveness of the generated datasets is investigated by training 3D semantic segmentation algorithms and evaluating the performance on real and synthetic point cloud data. The proposed approach for point cloud dataset generation will facilitate the development of generalizable and high level-of-detail 3D recognition algorithms toward autonomous bridge inspection.
Keywords
Introduction
Bridges play a vital role in the safety of people and the development of the economy by providing critical links in transportation systems. In many countries, bridges make up a large part of the infrastructure, and their effective management is a growing concern. For example, more than 60,000 bridges, about 14.7 bridges per 100 miles of public roadway, exist in the United States (Merten, 2023), among which 46,154 bridges (7.5% of total bridges) are structurally deficient or functionally obsolete. Approximately 42% of bridges are at least 50 years old, with the estimated need of 125 billion U.S. dollars for repair (ASCE, 2021). Europe has more than 1234 km of road bridges with over 100 m, many of which were built in the 1950s and have exceeded their design life (European Commission, 2019). In Japan, 23% of bridges are more than 50 years old (MLIT Road Bureau, 2017), requiring increased attention for their safe and functional operations. To ensure the continued transport of people and goods, those bridges should be maintained appropriately, based on the understanding of their structural conditions.
The conditions of those bridges are typically assessed by routine structural inspections. During the inspection, structural damage, such as cracks, corrosion, and wear problems, is identified and assessed. Bridge inspections are stipulated in many countries, including the United States (every 24 months for highway bridges (FHWA, 2004)) and Japan (every 5 years (MLIT Road Bureau, 2017)). During those inspections, structural problems in bridges are detected and solved in time, leading to extended service life and reduced maintenance costs. However, traditional bridge inspection mainly depends on manual visual inspection, which is time-consuming, costly, inefficient, subjective, and unsafe in special circumstances such as post-disaster inspections; improvement in the efficiency and accuracy of the current visual inspection process is desired.
Point cloud data can provide rich information that could aid bridge inspection process both in terms of accuracy and efficiency (Lin et al., 2021). Point clouds are composed of a large number of discrete points. By processing point clouds, information of the bridge structures, such as geometries, deformations, and topological relations between bridge parts and components (e.g., columns are below decks), can be extracted, beside surface texture maps (colors and normals). Such point cloud data can be readily collected by light detection and ranging (LiDAR) technology or by Structure from Motion (SfM) (Sánchez-Aparicio et al., 2023). In structural inspection, point clouds are often used in structural shape and geometry analysis (Li et al., 2023; Yu et al., 2022), surface damage detection (such as crack and spalling) (Feng et al., 2023; Ghadimzadeh Alamdari and Ebrahimkhanlou, 2024; Kasireddy and Akinci, 2022), structure segmentation (Kim et al., 2020), digital twins and Bridge Information Modelling (BrIM) (Mafipour et al., 2023; Perry et al., 2020). For example, Ye et al. (2018) segmented bridge components and detected geometric deformations in bridge structural components using point cloud data. Gao et al. (2024) proposed an effective framework for assessing and DT synchronizing local damage on a planar surface. Kim et al. (2020) classified bridge structural components using bridge point cloud data. Point cloud data can also be applied to store and visualize recognized inspection results as a condition-aware model (Spencer et al., 2019), or to plan UAV navigation paths for autonomous bridge inspection (Narazaki et al., 2022). One of the challenges of those point cloud processing approaches is the massive data size, making the use and management of point cloud data extremely inefficient (Qu et al., 2014).
Deep learning-based approaches have been investigated to handle large 3D point cloud information effectively. Qi et al. (2016) proposed PointNet, which directly uses 3D points as network inputs, and outputs either class labels for the entire input point cloud data or the label for each point. The PointNet learns the spatial encoding of each point, and then aggregates the features of all individual points into a global point cloud signature. To enhance the network’s ability to capture the local structure, Qi et al. (2017) proposed PointNet++, which processes a set of points hierarchically sampled in a metric space. Li et al. (2018) proposed PointCNN that exploits the spatial local correlation of the data represented in the point cloud by generalizing the classical convolutional neural networks (CNN). Wang et al. (2019) proposed Dynamic Graph CNN (DGCNN) with a new neural network module named EdgeConv, which acquires topological information about local features. Hu et al. (2020) proposed an efficient and lightweight neural architecture, termed RandLA-Net, to directly infer the per-point semantics of large-scale point clouds. To date, those algorithms have been applied successfully to various indoor and outdoor segmentation tasks.
In bridge inspection contexts, deep learning-based point cloud segmentation algorithms have been applied to structural component recognition tasks. Kim et al. (2020) proposed an automatic framework to recognize bridge components based on PointNet. In this research, full-scale RC bridge point clouds were segmented into subspaces. Those subspaces are segmented into deck, pier, pier cap, and background using PointNet. The results showed that the framework is capable of recognizing bridge components from 3D point clouds of full-scale bridges. This research was extended in (Kim et al., 2023) to the automatic recognition of bridge components based on close-range images collected by UAV imagery. PointNet++ was trained for structural component segmentation using subspaces of bridge point cloud obtained by Structure from Motion approach. Lee et al. (2021) proposed a graph-based hierarchical DGCNN (HGCNN) model based on the concept of PointNet and DGCNN. The proposed algorithm was applied to the segmentation of bridge components into deck, pier, abutment, pole, and background classes. Xiao et al. (2024) developed RandLA-BridgeNet based on the RandLA-Net and segmented the background, pier, superstructure, and parapet of the bridge point clouds. The results showed that the overall accuracy of RandLA-BridgeNet was 97.1%, which performed well in all the component classes. These point cloud semantic segmentation algorithms are expected to be a key technical step for autonomous bridge inspection by providing understanding of the critical structural components in a scene.
The segmentation performance of deep learning algorithms is closely related to the size and quality of point cloud dataset. To enrich the dataset, and partially driven by the successful use of synthetic images for 2D visual recognition for structural inspection (Hoskere et al., 2022; Narazaki et al., 2021, 2024; Zhai et al., 2022), researchers have often leveraged synthetic point cloud data (Jing et al., 2022; Lamas et al., 2024; Yang et al., 2022). Such synthetic data is typically generated by sampling from mesh surfaces, leading to uniform and clean point cloud data that is different from actual point cloud data collected in the field. Moreover, mesh of a few, or a few types of bridges are typically considered for their specific application scenarios, incurring generalizability issues. The effective synthetic data generation approaches that can produce realistic (e.g., SfM-like) point cloud data for a variety of bridge types and data collection scenarios (e.g., camera trajectories) have not been investigated fully.
This research proposes an approach for developing a large-scale synthetic point cloud dataset that can represent individual data collection scenarios for the inspection of various types of bridges and their components. First, synthetic environments of six types of bridges are generated randomly and automatically using Random Bridge Generator (RBG), a platform for developing computer vision-based bridge inspection algorithms. Then, 2D images are collected by simulating the data collection by UAV flying along random trajectories. The bridge point cloud is generated from the collected images using Structure from Motion method. The point cloud is automatically annotated by using ground truth maps obtained from the synthetic environments. At the same time, the proposed approach stores the uncertainty information of each point defined by the error between the ground truth depths and the depths calculated from the Structure from Motion results. The dataset with different levels of outliers can be provided by applying a threshold to this uncertainty information before training the network. To validate the effectiveness of the proposed approach, two datasets are generated by simulating UAV-based image collection process: (1) a local dense point cloud dataset, and (2) a global sparse point cloud dataset. Point cloud semantic segmentation algorithms, such as PointNet++ and RandLA-Net, are trained on those two datasets, respectively. The performance is evaluated on real and synthetic datasets to validate the potential of the proposed approach for 3D visual recognition of bridge inspection.
The contributions of this research to autonomous bridge inspection are: (1) This research enables the automated and random generation of annotated large-scale bridge point cloud datasets for UAV-based image collection and SfM-based 3D reconstruction scenarios. The data generated by simulating the entire bridge inspection process contains important patterns of the real-world data, such as the level of detail achieved in the specific target inspection scenario, nonuniform point distributions (caused by viewpoints, occlusions, and surface textures), and SfM noise both in local and global scales. (2) The proposed approach stores the uncertainty information of the points, enabling the flexible definition, inclusion, and removal of noisy points by simple thresholding. The network trained using the synthetic data generated by the proposed approach can recognize the noisy points explicitly (as “noisy point” class), as opposed to existing approaches that require a denoising method in addition to deep learning-based semantic segmentation algorithm. (3) This research clarifies the effectiveness of the whole process of generating synthetic data of the proposed method through the validations on both real and synthetic data. In particular, the effectiveness of the proposed approach in improving visual recognition accuracy is clarified through the comparison with an existing approach that samples points directly from mesh surfaces.
The next section discusses the related work and research objectives. Then, the proposed methodology for generating point cloud datasets is discussed, followed by the implementation and evaluation of the proposed data generation approach. Finally, concluding remarks are provided.
Related work
Semantic segmentation with synthetic 3D point cloud dataset
The existing research.
Jing et al. (2022) generated a large-scale masonry arch bridge point cloud dataset with a synthetic simulator. Firstly, this research generated four basic components based on quadrilaterals and partial cylinders: span, abutment, pier, and wingwall. These components are combined into 3D bridge models, including straight and curved bridges. The synthetic arch bridge point clouds were generated by sampling points from the mesh and removing parts of the point clouds randomly. The synthetic arch bridge dataset was labeled manually, and was used for training a semantic segmentation algorithm, termed BridgeNet therein. The effectiveness of the synthetic dataset was evaluated by testing the network on a real arch bridge dataset.
Lamas et al. (2024) generated synthetic point clouds of two types of truss bridges (Bailey and Brown truss bridges) by parametric modeling approach and uniform sampling from the mesh surface. Firstly, this research calculated the nodes of the bridge, and created the components by defining the cross-section and placing them in appropriate locations. Two datasets were generated: (1) complete point cloud dataset realized by uniform sampling, and (2) incomplete point cloud dataset that simulates occlusion effects. Each point cloud stored semantic and instance information with four segmentation classes: deck, chord, parallel and diagonal. A semantic segmentation network, termed JSNet, was applied to segment the generated point cloud data.
Overall, three major challenges are identified from the research presented in Table 1. First, most existing research samples points from mesh surfaces to obtain point cloud data, leading to a uniform distribution of points without outliers. However, the point clouds generated by such approaches tend to have significant differences in point distributions from the ones obtained in the field, particularly when SfM and UAV-imagery are used (complexities introduced by viewpoints, occlusions, and surface textures). Secondly, most existing approaches focus on a single or a few types of bridges, and their extension to other types of bridges and components is needed. Finally, some of the synthetic datasets rely on manual labeling, which is still time-consuming, labor-intensive, and prone to incorrect labeling. By using randomly generated synthetic environments to simulate the entire process of bridge inspection, including UAV-based image collection and 3D reconstruction using SfM, annotated large-scale point cloud datasets are generated in this research. This process enables the dataset to represent important and diverse patterns of the real-world data efficiently and automatically.
Research objective
In this research, a method for generating large-scale synthetic bridge point cloud data based on UAV image data is proposed. The proposed approach is based on the random and procedural generation of six types of bridges: slab bridge, beam bridge, girder bridge, arch bridge, suspension bridge, and cable-stayed bridge. This research collects 2D images in randomly generated bridge environments by simulating UAV data collection process, and generates point cloud datasets using the SfM method. The point clouds are automatically labeled using the 2D ground truth label maps obtained from synthetic environments. Before training, the level of outliers of the point cloud can be adjusted by applying a threshold to the point uncertainty data, defined by the error between the ground truth depth and the depth of the SfM results. This research can facilitate the development of deep learning-based autonomous bridge inspection system by providing a rich source of point cloud data for 3D visual recognition of bridges.
Synthetic data generation approach
Overview of the proposed approach
The research framework for the development of large-scale synthetic point cloud dataset is shown in Figure 1. Based on the authors’ previous work of developing a platform for developing computer vision-based bridge inspection algorithms, termed Random Bridge Generator (RBG) (Cheng et al., 2024), the proposed approach proceeds in the following two stages. The first stage is to simulate the process of data collection by UAV in bridge inspection environments. This research proposes two ways of setting up random yet plausible camera paths to collect local bridge synthetic images and global bridge synthetic images, respectively. In the second stage, the synthetic data collected in the first stage is used to generate dense point clouds and sparse point clouds using the SfM method. When dealing with the outliers in the point clouds generated using SfM, this research adopts a method to manage the outliers in the generated point clouds. By combining point labels and point uncertainty information, rich data with accurate annotation labels for points can be generated efficiently. Each stage in this framework is discussed in the following sections. The framework of the proposed approach.
Random Bridge Generator
This research is based on the Random Bridge Generator to generate synthetic environments for different types of bridges. Random Bridge Generator (RBG), an algorithmic platform that can provide synthetic environments, can generate six different types of bridges randomly, automatically, and procedurally (Cheng et al., 2024), as shown in Figure 2. RBG can generate slab bridge, beam bridge, steel girder bridge, arch bridge, suspension bridge, and cable-stayed bridge with different trajectories by randomly assembling different types of components. Six types of bridges from RBG.
The parameters of the bridges.
Realistic 3D environments are created and rendered by assigning the textures for each component. To mimic the complexity of the real-world backgrounds, this research imports 3D models of city sections (Import of Google 3D Cities, 2024), and puts the bridge models in those background models. The scenes imported as background for the bridges include Chicago in the United States, Kobe in Japan, and Lugano in Switzerland, representing different characters of environments ranging from busy cities to scenic areas. The imported background models are shown in Figure 3. In this research, 3D models of these areas are chosen randomly and imported into Blender scene, followed by the placement of the generated bridge model at the random location in that background model, increasing both the diversity and robustness of the bridge images. The backgrounds of the bridges.
Synthetic image collection process
This research simulates UAV-based image collection in synthetic bridge environments by the following three steps: (1) establishing the scenarios for simulating the data collection process, (2) generating random camera trajectories according to the selected scenarios, and (3) rendering images by placing camera models along the trajectory. Blender 3D computer graphics modeling software with its Python Application Programming Interface (API) (Blender 4.0 Python API Documentation — Blender Python API) is used to implement the steps discussed in this section. These steps are described in detail below.
Scenarios for simulating data collection
The proposed approach replicates realistic UAV-based image collection scenarios in the synthetic environment to produce rich and relevant synthetic datasets. This research investigates two data collection scenarios that are particularly important from the perspective of the application of point cloud data: detailed assessment of bridge components using dense point cloud data, and the recognition of global structural system using sparse point cloud data. (i) Scenario 1: detailed assessment of bridge components using dense point cloud data.
This scenario simulates the process in which a UAV collects close-range images of critical structural components to assess their surface damage (Kim et al., 2023). In this scenario, close-up images near the bridge components are collected, and bridge component types are recognized from those images. With limited global contexts, recognizing bridge component types in each close-up image is a challenging task. In such cases, dense point cloud data obtained from collected images can help the registration of each image and damage detected therein in the global structure. A synthetic dense point cloud dataset that mimics this close-range image collection scenario would facilitate the recognition of bridge components for detailed structural assessment. (ii) Scenario 2: recognition of global structural system using sparse point cloud data.
This scenario simulates the process in which a UAV collects images of an entire bridge, rather than getting dense reconstruction for the selected parts. The point clouds obtained in this scenario are sparse point cloud data that contains global information of bridges. The point clouds collected in this scenario would be most beneficial for: (1) automated Bridge Information Modeling (BrIM) and digital twin technologies; (2) Mobile robotics navigation and path planning, among many others.
Generation of camera trajectories in synthetic environments
This section provides detailed descriptions of the process of determining camera trajectories for the two scenarios discussed in the previous section. (i) Camera trajectory for Scenario 1: detailed assessment of bridge components.
This research realizes this scenario by sampling random camera trajectories inside a “local” bounding box that contains parts of the bridge structure. The dimensions of the bounding box are The bounding box for sampling cameras of local bridges. The voxels in the bounding box and the occupied voxels of local bridges. The different camera paths for local bridges.


This research determines the camera rotations in this scenario using camera points of interest (POI) and spherical linear interpolation, as shown in Figure 7. First, this research calculates the distance from the camera to each bridge component, and identifies the nearest component and the shortest distance to that component. Then, among the cameras with the same component index in a sequence, the point with the smallest shortest distance is selected as the POI by comparing the minimum distances to that component. In addition to the POIs selected in this method, the POIs at the starting and ending points of the path are specified by the nearest points on the structure. Then, rotations associated with the remaining camera locations in the path are determined by the spherical linear interpolation method to ensure the coherence and overlap of the captured images. Finally, to improve the overlap of the captured images, the camera trajectory is up-sampled to have a camera spacing of 0.5 m (the rotations are interpolated), as illustrated in Figure 8. Camera rotation setting method for detailed assessment of bridge components. (a) Visual illustration of the process, and (b) corresponding data in a table format. Example camera trajectories before and after up-sampling.

Many camera-equipped UAVs have constraints in their pitch angles, e.g., (ii) Camera trajectory for Scenario 2: recognition of global structural system. Two approaches for determining the pitch angle.

This research realizes this scenario by defining a “global” bounding box that contains the entire bridge, and sampling waypoints on the surface of that bounding box. First, the tight bounding box of the bridge is obtained, and the external bounding box is set up by expanding the tight bounding box. The dimensions of the external bounding box are determined by extending the tight bounding box by a distance The external bounding box for sampling cameras of global bridges. The camera path of global bridges.

This research determines camera rotations by setting up camera POIs. First, the external bounding box and the camera trajectory on the surface are scaled down by 50% (Figure 12). This internal bounding box has points that correspond to the camera waypoints on the external bounding box. Then this research sets those points on the internal bounding box as the POIs. No constraint for the camera pitch angles is imposed for this global reconstruction scenario (Approach 1 in Figure 9). The rotation of the cameras of global bridges.
Camera placement and image rendering
According to the above simulation scenarios and the associated camera trajectory settings, the synthetic cameras are placed in the synthetic bridge environments. The focal length of the camera is 24 mm, and the clip distance of the camera is 5000 m. (i) Image rendering for Scenario 1: detailed assessment of bridge components.
In this research, a total of 399 synthetic bridge environments are generated, and, 29,530 close-up images of bridge components are collected with the resolution of (ii) Image rendering for Scenario 2: recognition of global structural system. Example synthetic images collected for scenario 1.

In this research, a total of 220 synthetic bridge environments are generated, and 83,118 images of global bridge structures are collected with the resolution of Example synthetic images collected for scenario 2.
Synthetic point cloud data generation process
3D reconstruction by structure from motion
Bridge point cloud data is obtained from the collected synthetic images by applying the SfM. In this research, this step is implemented using the Python API of Metashape (Metashape Python Reference Release 1.8.2 Agisoft LLC, Agisoft, 2022). The detailed process for the two scenarios discussed previously is described in the following subsections. (i) Scenario 1: dense point clouds generation.
The parameters for generating local dense point clouds.
(ii) Scenario 2: sparse point clouds generation.

Example dense point clouds for scenario 1.
The parameters for generating global sparse point clouds.

Example sparse point clouds for scenario 2.
Automated point cloud labeling process
This research adopts an automated approach for point cloud annotation, because manual annotations of point cloud data are extremely time-consuming and inaccurate. The SfM point cloud data of the bridge can be annotated by one of the following methods: (1) registering points to 3D ground truth model, and (2) mapping labels from rendered 2D ground truth maps to the corresponding 3D points. If the 3D ground truth model is used directly to annotate the point cloud data, the model should be aligned to the point cloud, and then the correspondences between each point and different parts of the 3D model should be identified. Both of those steps (model alignment and point assignment) are estimation problems, and may be susceptible to errors caused by SfM inaccuracies and the differences between 3D reconstruction and the 3D ground truth model. In contrast, when the 2D ground truth label maps are used to annotate the point cloud data, the mapping relationships between the points in the SfM point cloud and the 2D images can be read directly from the SfM results, which is more efficient and accurate (no need for additional estimation step). Therefore, this research maps 2D labels to the point cloud to obtain the annotations.
The process of projecting ground truth labels to the 3D point cloud data takes the following steps. First, 3D points, Example raw labeled projection results.

Computation of point uncertainty information
To address the problems pointed out in the previous section, this research proposes a method to manage outliers by storing the uncertainty information of each point. In this research, point uncertainty is defined as the error value between the ground truth depth to each point in images and the corresponding depth estimated by the SfM. The ground truth depth can be obtained by looking up the ground truth depth maps rendered with images. The estimated depth can be calculated based on the 3D coordinates and projection matrix available from SfM results. The difference between the ground truth and estimated depths is calculated as the uncertainty information of the points, and is stored with the point labels. By assigning different thresholds to the uncertainty information, point cloud data with different levels of outliers can be provided. For example, as shown in Figure 18, point clouds with different levels of outliers can be obtained for the threshold values of 100, 1, and 0.01, respectively. The method of using uncertainty information to manage point cloud data expands the applications of the dataset generated by the proposed approach (e.g., applications related to SLAM), facilitating the automation of bridge inspection process. The point clouds with different uncertainty threshold values. Only points that satisfy the threshold are plotted herein.
Formatting datasets for a specific use
The point clouds with the annotations and the uncertainty information discussed so far provide flexibility of the proposed approach for adjusting the dataset for specific problems, including the two scenarios considered in this research. The format process can take three main steps: (1) coordinate transform, (2) controlling the outlier level using uncertainty threshold, and (3) saving the data in specific file structures and formats. This research first converts the point cloud data in the SfM coordinate system to the world coordinate system (the coordinate system used by the RBG) by estimating the transformation using the Random Sample Consensus (RANSAC) method (Fischler and Bolles, 1981). Using camera locations expressed in the SfM coordinate system The point clouds after changing the coordinates.

After coordinate transform, outlier points (“noisy point” class) are defined by the thresholding discussed in the previous section, and point coordinates, point labels (including “noisy point” class), and transformation matrix are saved in a format required by the downstream task.
Data analysis
Analysis overview
Summary of the datasets.
aNumber of point clouds after formatting.
Summary of the analysis cases.
The trained networks are evaluated by accuracy, intersection over union (IoU), mean accuracy, and mean intersection over union (mIoU) which are defined below:
Datasets
Real bridge dataset
This research uses the dataset collected in the real world to train and validate the effectiveness of the proposed method in this research. This dataset was previously developed and discussed in (Kim et al., 2020, 2023), and the performance presented therein is used as the reference for this study. (i) South Korea real bridge dataset.
Kim et al. (2020) used LiDAR to collect point clouds of seven bridges in South Korea, with total lengths ranging from 54 m to 272 m and span lengths ranging from 17 m to 38 m. The dataset was manually labeled into four classes: background, pier, pier cap, and slab. The point clouds were divided into 10 m subspaces along the bridge longitudinal direction with 80% overlaps. In each subspace, 4096 points were randomly sampled until all points were selected. The resulting dataset consists of 1461 point clouds of seven bridges, each containing 4096 points. Example point clouds in the South Korea real bridge dataset are shown in Figure 20. (ii) Illinois real bridge dataset. Example point clouds in the South Korea real bridge dataset.

This dataset was also developed by (Kim et al., 2023) using SfM method based on the collected images of a highway bridge located in Urbana, Illinois, the United States. First, the commercial drone (DJI Mavic Air 2) was flown near the target surface, recording 10 min of video through the installed camera with an image resolution of The Illinois real bridge dataset.
Compared to the LiDAR-based dataset shown in Figure 20, this SfM-based dataset contains many kinds of complexities, such as the nonuniform density of the points and severer occlusions of the structural parts. This dataset was selected for the validation in this research because of these real-world complexities, as well as the availabilities of reference results of 3D semantic segmentation of bridge components presented in the previous research (Kim et al., 2023).
Synthetic datasets
(i) Local dense point cloud dataset (SfM).
This research generated a local dense point cloud dataset of bridges using the proposed method. In this research, 399 different bridge environments were generated using RBG, and 29,530 images were rendered in those environments. The dense point cloud data of 399 bridges was generated using these images. In some cases, not all cameras were aligned accurately because of the quality of collected images along randomly generated trajectories. This research defined invalid cameras as the ones that are not aligned by the SfM algorithm or the ones that are considered outliers by the RANSAC algorithm. If the number of invalid cameras was greater than (ii) Local dense point cloud dataset (sampling from mesh). Example synthetic local dense point cloud data by SfM. The coordinates are normalized to have the range of [0,1] as a preprocessing for PointNet++ [27].

This research developed another dense point cloud dataset to compare the effectiveness of the proposed SfM-based data generation approach with sampling from mesh method applied in existing research. Using the same bridge models discussed in the previous section, 140 dense point clouds were generated by uniform sampling from the mesh surface. In this dataset, 10-m-long subspaces were selected at the same locations as in the SfM case, and point clouds were generated within these subspaces by sampling 4096 points. The dataset was formatted with the same data augmentation methods (flipping, scaling, and rotating), resulting in 2944 dense point clouds with 4096 points per point cloud. The example data is shown in Figure 23, where significant differences in point distribution can be observed compared with Figure 22. (iii) Global sparse point cloud dataset (SfM). Example synthetic local dense point cloud data generated by sampling from mesh method. Significant differences in point distribution can be observed compared with Figure 22.

This research developed the global sparse point cloud dataset by generating 220 different bridge environments using RBG, in which 83,118 images were rendered. Then, SfM was applied to obtain sparse point cloud data. Similar to the local dense point cloud dataset, invalid cases with more than 40% of unsuccessful or inaccurate camera alignment were discarded during the automated annotation process of the point cloud data. Then, 65,536 points were randomly selected in each bridge point cloud. A threshold value of 0.1 was applied to the uncertainty information of the points in the point cloud data, and points that did not satisfy the threshold were assigned to “noisy point” class (note that those points were discarded in the local dense point cloud dataset, rather than assigned the “noisy point” label). The final dataset contains point cloud data of 203 bridges, and each point cloud contains 65,536 points, as shown in Figure 24. Example synthetic global sparse point cloud data obtained by SfM.
Component recognition with local dense point cloud dataset
Task-Local1: Data augmentation performance of synthetic local dense point cloud dataset
In this section, data augmentation effect of the proposed data generation approach is evaluated by training the PointNet++ algorithm on the South Korea real bridge dataset mixed with the synthetic local dense dataset. The mixed dataset containing 4405 point clouds, was split randomly into training set (60%) and validation set (20%), following (Kim et al., 2023). The PointNet++ network was trained on the point cloud data scaled to have coordinate values in the range [0, 1] for 50 epochs with mini-batch size of 8, and initial learning rate of 0.0001. After training for component segmentation, the epoch before the significant overfitting was selected to perform testing on the Illinois real bridge dataset. The test results with Illinois bridge are shown in Figure 25. The accuracy and IoU of each class are shown in Table 7. The ground truth and the predicted results for the Illinois bridge dataset (South Korea real bridge dataset mixed with the local dense SfM-based point cloud dataset). The Comparison of the results using South Korea dataset only with that of mixed dataset.
Compared with (Kim et al., 2023) shown in Table 7, the recognition performance of major bridge components, such as column and deck, were improved or maintained high values. The recognition accuracy of column significantly increased from 78.78% to 89.53%. The recognition accuracy of deck was still high, increasing from 92.14% to 92.38%. However, the accuracy of the background and pier cap decreased, which may be because the point features of these two classes in synthetic dataset generated from RBG have significant differences with the real Illinois bridge dataset. For example, in RBG, the background around the bridge is a plane area of
The confusion matrices of the results are shown in Figure 26. Main reason for the decreased mean accuracy and mean IoU was the confusion between column and pier cap, with 18% of the pier cap points predicted incorrectly as column class. To investigate the data augmentation effect for classes which were modeled relatively well in RBG, columns in particular, accuracy and IoU values without pier cap class are presented in Table 8. The accuracy and IoU of column class showed a significant increase from the values in Table 7, exceeding the reference case in (Kim et al., 2023) by around 2% in accuracy and around 4% in IoU. The IoU of background also increased from 59.43% to 64.69%. The deck class was not improved but maintained relatively high accuracy and IoU. The mean accuracy (88.82%) and mean IoU (78.50%) also improved, including the 2% improvement of mean IoU. The results show that the proposed approach can effectively improve the segmentation performance for the components that are modeled with sufficient diversity in the synthetic environments, while further research is needed to improve the diversity of pier caps and backgrounds in the synthetic environment. The confusion matrices of testing results. The accuracy and IoU except pier cap class.
Task-Local2: Robustness of the synthetic dataset generated using SfM method in the real world
This research compares the data augmentation performance of synthetic local dense point cloud data generated by SfM method and the synthetic local dense point cloud data generated by sampling from mesh method to validate effectiveness of the proposed point cloud generation approach. In addition to the testing results obtained in the previous section, PointNet++ algorithm was trained on the synthetic dataset generated by sampling from mesh approach mixed with the South Korea real bridge dataset. The mixed dataset contains 4405 point clouds (60% for training, and 20% for the validation). The PointNet++ training was performed with the same hyperparameters as the ones used in Task-Local1 in the previous section. After training, the epoch before the significant overfitting was selected, and the network was tested on the Illinois real bridge dataset. The test results are shown in Figure 27. The accuracy and IoU of each class are shown in Table 9. The ground truth map and the predicted results of Illinois bridge with sampling form mesh method. The testing results of Illinois’s bridge of PointNet++ with local dense point clouds (sampling from mesh).
Compared with the testing results of synthetic local dense point cloud dataset (sampling from mesh), the testing results of synthetic local dense point clouds (SfM) were better in the real-world dataset performance. Especially, the metrics of the column improved in accuracy from 72.98% to 89.53%, and in IoU from 55.22% to 62.76%. The accuracy and IoU improved by around 5% for the deck, around 6% for background, and around 9% for the pier cap. The mean accuracy (80.68%) and mean IoU (67.05%) of using synthetic dataset from SfM method were around 9% higher than the mean accuracy (71.46%) and mean IoU (57.39%) of using synthetic dataset from sampling from mesh method. Those results show that synthetic dataset generated by SfM method represents the real-world recognition task better.
Task-Local3: Recognition performance only using synthetic local dense point clouds (SfM)
In this section, another PointNet++ training is performed using synthetic local dense point cloud dataset (SfM) (without mixing the South Korea real bridge dataset) to demonstrate the potential of the synthetic dataset to replace the real-world dataset. 60% of the 2944 local dense point clouds were used to train the PointNet++ with 50 epochs, mini-batch size of 8, and initial learning rate of 0.0001. 20% of the South Korea real bridge dataset was used for validation, and the Illinois real bridge dataset was used for testing the network. The testing results are shown in Figure 28. The accuracy and IoU are shown in Table 10. The ground truth map and the predicted results of the Illinois bridge with only using synthetic local dense point cloud dataset (SfM). The testing results of Illinois bridge of PointNet++ with only using synthetic local dense point clouds (SfM).
According to the results, major bridge components, such as column and deck, can be recognized reasonably. The accuracy of column and deck was above 90% with column 98.58% and deck 93.77%, which showed potential of using synthetic local dense point clouds (SfM) in component recognition task in the real world. The accuracy of pier cap is lower (24.86%), which could be explained by the limited diversity of the components in the synthetic environments, as discussed previously. The accuracy of the background was nearly 0, indicating the significant differences in features of the synthetic background from the real background features of Illinois dataset. This is also the direction of the improvements for the future research.
Overall, the results of these three tasks demonstrated the effectiveness of the synthetic local dense point cloud dataset (SfM) generated by the proposed approach on component recognition task in the real world. The local dense point clouds (SfM) can be applied on data augmentation for multiple types of bridges (particularly when compared with sampling from mesh approach), and could potentially be used to replace the real-world data in the future.
Component recognition with synthetic global bridge point cloud dataset (SfM)
The testing results of synthetic data of randla-net with global sparse point clouds (SfM).

The ground truth and prediction of global bridges.
According to the results, major components were recognized successfully, showing the effectiveness of using global sparse point clouds to understand the bridge structure and bridge components. Moreover, the network not only learned to classify each point into component classes, but also to clean the noisy point cloud data by explicitly learning to recognize outlier points defined by applying the uncertainty threshold. Among the components with high accuracy, the arch, abutment, column, pier cap, and background classes were recognized with above 90% accuracy, especially for arch 99.28%, abutment 98.77%, and column 93.73%. The accuracy of deck, girder, beam, and parapet was above 80% with parapet 89.60%, girder 89.30%, deck 84.81%, and beam 81.62%. More than half of the classes had accuracy of more than 80%. The accuracy of slab, sleeper, and bearing was higher than 60%. The accuracy of cable, track, and noisy points was around 30%, which could be partially explained by small number of points belonging to those classes. Cable top and cable base class had zero accuracy. That is because the cable top and cable base only appear on cable-stayed bridges and suspension bridges, and this type of minor component rarely appears in the sparse point cloud obtained by the SfM method. Overall, the results of the component recognition on global sparse point clouds show the potential of the proposed approach for automatically recognizing various structural components from quick scans.
Future extensions
The effectiveness of the synthetic data generated by the proposed approach depends on the realism of the synthetic environments, as well as the domain gap between the synthetic and real-world environments. Approaches for improving the proposed approach include: (1) Adding procedural rules or incorporating deep learning-based generative algorithms in RBG to increase the variety of generated bridges, e.g., additional side spans and pier caps with different sizes and shapes. (2) Importing more complex backgrounds in the synthetic environments to simulate the environment near the bridge. (3) Incorporating Unsupervised Domain Adaptation (UDA), a technique that can adapt deep learning-based algorithms trained in one domain (synthetic dataset) to another domain (real-world dataset). The preliminary results of incorporating UDA into 2D bridge component recognition have been shown in (Narazaki et al., 2024).
Conclusions
This research developed an approach for generating large-scale synthetic bridge point cloud datasets. The aim of this research is to collect 2D images in bridge synthetic environments by simulating UAV image collection process, and then use Structure from Motion method to obtain a large-scale synthetic point cloud dataset with automatic ground truth annotation. This research also proposed to compute the uncertainty information of each point to control the levels of outliers. First, this research used RBG to produce synthetic environments of different bridges randomly, automatically, and procedurally. Then, this research created the camera trajectories by simulating the UAV-based image collection. The 3D point clouds were reconstructed from the collected images by the SfM method. Finally, the point clouds were automatically labeled by ground truth annotations obtained from synthetic images. At the same time, the datasets stored the uncertainty information of the points, which is defined as the error value between the ground truth depth and the depth obtained from the SfM results. By assigning a threshold value to this uncertainty information, point cloud data with different levels of outliers can be provided. Using the proposed method, two UAV-based data collection scenarios were simulated: local dense point cloud data collection and global sparse point cloud data collection. The local dense point clouds and global sparse point clouds were used to train PointNet++ and RandLA-Net for bridge component recognition, respectively. The PointNet++ was tested on the real-world bridge dataset, and RandLA-Net was tested on the synthetic point clouds. The testing results showed the following four points: (1) Local dense point clouds generated by the proposed approach can be used for data augmentation purposes, especially for the major components, such as column and deck. (2) Synthetic dataset generated by SfM method has more robustness than synthetic dataset generated by sampling from mesh method in the real world. (3) Local dense point cloud dataset generated by the proposed approach has the potential to be used for component segmentation in the real-world dataset, with 98.58% of accuracy on column and 93.77% of accuracy on deck. (4) Global sparse point cloud dataset generated by the proposed approach has the potential to be used for understanding bridge structure and bridge components. The accuracy of 9 classes over 16 classes in total was above 80% with arch 99.28%, abutment 98.77%, and column 93.73%.
Future work includes (1) investigations to reduce the domain gaps, (2) additional extensive validation using various real-world datasets, (3) extensions to a broader range of bridge inspection problems, including point cloud-based surface damage recognition. With these investigations, the proposed approach can not only provide sufficient point cloud data for 3D bridge visual recognition, but can also be extended to multiple fields of bridge inspection, such as SLAM and BrIM, greatly facilitating the development of autonomous bridge inspection systems.
Footnotes
Acknowledgments
The authors acknowledge the financial support from the National Natural Science Foundation of China (Grant No. 42250410334, 52361165658). The last author acknowledges the valuable research and career guidance provided by Professor Yozo Fujino since his undergraduate research.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (Grant No. 42250410334, 52361165658).
