Fusion of residual UNet image segmentation model and digital twin for object repair system

Abstract

Traditional object repair methods usually rely on complex algorithms and a lot of manual labor, making it difficult to repair large and complex objects. In order to improve the efficiency of repairing large and complex targets and reduce the workload of object repair, an object repair model based on digital twin technology is developed. A charge-coupled device in the mechanical arm collects image data from the target object and transfers it to the real-virtual coordinate matching module. The attention mechanism is introduced to construct a multiscale residual UNet object image segmentation model for real-virtual coordinate matching to deal with the object repair trajectory planning problem. When verifying the function of the model, the loss degree, repair model parameters, and object repair error are used as evaluation indicators. The results of the model performance test revealed that the proposed multiscale residual UNet model for object image segmentation exhibited an accuracy of 0.942 while maintaining a loss value of 0.099 at steady state. Compared to the traditional UNet model, the study’s model had fewer parameters by 20.02 million and a slightly improved prediction accuracy of 0.01 on the self-built dataset. Additionally, the inclusion of the attention module enhanced the prediction accuracy by 0.02 M without adding too many parameters. The experiment demonstrated that the deviation between the predicted and actual object center coordinates was less than 1 mm, both horizontally and vertically. This sub-millimeter accuracy allowed for precise virtual-to-real alignment, which was essential for the operation of high-fidelity digital twins. It also ensured reliable robotic manipulation in demanding applications, such as precision component repair and cultural heritage restoration. Furthermore, the use of a dual-robot cooperative approach proved more effective. It was more effective in completing complex repair tasks. It increased system repair capability. It enabled omni-directional repairs previously unattainable with a single robot arm. Crucially, this dual-robot strategy reduced the time required for complex object repair to under 15 s. This represented a 40% improvement in speed over single-robot operation, while maintaining a repair pass rate of 99.2%.

Keywords

UNet image segmentation object restoration attention mechanisms digital twins

Introduction

With the continuous innovation and breakthroughs in technology, the application of industrial robot arms in many fields is becoming increasingly popular and deepening, especially in the key field of equipment maintenance and repair, where their role is becoming more prominent and indispensable.^1,2 In recent years, the application ratio of industrial robot arms in various maintenance tasks has shown a significant increase year by year due to their excellent performance advantages.³ These intelligent robotic arm systems can efficiently and reliably replace manual labor to complete maintenance tasks in harsh environments, complex operations, or potentially high risks (such as high temperatures, high pressures, toxicity, confined spaces, or repairs of precision components). Moreover, they are characterized by excellent repeatability, stable motion control, and the ability to accurately replicate complex trajectories. This radically improves the quality and efficiency of maintenance operations.^4,5 More importantly, robotic arms free operators from dangerous or harsh frontline environments. It also enables people to undertake safer and more efficient tasks such as remote monitoring, decision-making, and programming management. The use of robotic arms significantly improves the overall working environment for workers and effectively reduces the risk of occupational health hazards and safety incidents.^6,7 This technology empowerment is reshaping the traditional operation mode and safety standards of the maintenance industry.⁸ Traditional manual maintenance is often limited by human factors, such as operator fatigue, and skill level differences, which may affect the consistency and reliability of maintenance results. Moreover, the robotic arm can ensure the quality and effectiveness of every repair through precise control. Trajectory planning is an important factor affecting the quality of robot repair. When using a robotic arm for object repair (OR) work, it is necessary to plan the motion trajectory of the robotic arm in advance to ensure the smooth progress of the repair process. The number of parameters involved in performing repair work on large and complex targets and planning the motion trajectory of the repair robot arm is very large. This leads to low efficiency and high cost of existing robot arm OR techniques in handling large and complex targets. The prevailing approach relies on CNN and RNN, which demonstrate efficacy in standardized experimental scenarios. However, their performance is suboptimal in practical applications involving noise interference, object differences, and non-standard environments. Additionally, they exhibit limitations in terms of accuracy and efficiency. ResNet, SegNet, L-UNet, and other networks are commonly used trajectory planning network types and also easily improved neural networks. Existing digital twin restoration technology relies on traditional image segmentation techniques that often result in inaccurate segmentation of complex geometric shapes and textures. This damages the fidelity of virtual models, leading to a misalignment between digital twins and physical reality, which is not conducive to accurate trajectory planning and execution. Therefore, in order to improve the efficiency of the robotic arm in handling large and complex target repairs and reduce trajectory planning calculation costs, the study proposes the use of digital twin technology (DTT) to construct a virtual model of a robotic arm. It utilizes the UNet network structure to calculate and optimize trajectory planning parameters, reducing the computational parameters of trajectory planning.

The innovative research proposes the use of UNet network to extract and recognize image features of target objects, and the use of DTT technology to construct a virtual model of the target object and design the motion trajectory of the robotic arm during the repair process. The main contribution of the research is the construction of a framework for repairing large complex targets and the design of a virtual model of a robotic arm for repairing large complex targets. The study improves the repair efficiency of the large complex target repair technology and reduces the repair cost.

The study comprises four parts. The first part provides an overview of the UNet model and the application of DTT. The second part proposes an OR system that employs DTT and integrates attention mechanisms (AMs) to construct the multiscale residual UNet object image segmentation (MR-UNet-OIS) model. The third part examines and evaluates the performance of the system and the model. The fourth part presents a summary of the aforementioned parts.

Related works

Research on rehabilitation robotic systems has been conducted in various fields. For instance, Keng et al. concentrated on the novel prospects and distinctive challenges associated with high-level robotics and polyurethane (PU) rehabilitation technology for utility pipe maintenance. They put forward an automated rehabilitation process, which addressed issues relating to robot complex attitude alignment and PU curing via the use of a protective sleeve and PU foam. The method’s experimental results exhibited high precision and delivered quality repair outcomes.⁹ Xiaodong et al. aimed to develop an intelligent mobile robot platform with integrated non-destructive evaluation (NDE) capabilities for autonomous real-time inspection and repair. The robot design integrated components of robotics, NDE, and artificial intelligence (AI). Experimental results demonstrated the system’s capabilities in NDE and repair, which were expected to improve personal safety, prevent structural damage, and reduce maintenance costs.¹⁰ Xu et al. proposed a climbing robot system based on independent quadrilateral suspension for the inspection and repair of cables on cable-stayed bridges. The robot integrated automatic repair mechanisms, including grinding, cleaning, and spraying, and can automatically complete the inspection and repair of cables. Experimental results showed that the robot was able to climb at a speed of 0.26 m/s, carry a load of 11.5 kg, and cross an obstacle of 15 mm height, which effectively improved the automation level of cable repair.¹¹ Digital twins (DTs) are used to simulate and predict physical systems for a wide range of applications in industry, healthcare, and urban planning, for example, Christina et al. proposed a novel DT model for the performance enhancement of complex manufacturing systems, which was integrated into a multi-agent network of cyber-physical systems (CPSs). The model automatically detected anomalies in sensor data and identifies bottlenecks in operations by introducing a new “monitoring agent” that automatically communicates with other agents at different levels to detect anomalies and identify bottlenecks. Experimental results showed an average improvement of 30% in human resource utilization.¹²

Bertoni M. et al. focused on the application of DT in the product service system (PSS) domain, exploring the extent to which DT had been applied in the PSS lifecycle and whether DT-related case studies could be captured from a comprehensive values perspective Through a systematic literature review and mapping study, the study revealed that there was only a very small number of existing literature that could show how to leverage real-time physical-to-virtual and virtual-to-physical connections to improve the design of servitized solutions.¹³ Lu et al. explored the application of DT and sixth-generation (6G) mobile networks in the industrial Internet of Things (IoT), especially in edge intelligence. A blockchain-based federated learning framework was proposed by introducing a DT wireless network (DTWN) to migrate real-time data processing and computation to the edge layer. Numerical results showed that the scheme improved efficiency and reduced cost compared to benchmark learning approaches. UNet is a deep learning architecture mainly used for IS tasks. Due to its excellent performance and efficient architecture, UNet was widely used in several application areas.¹⁴ Mendu presented a novel deep learning based alternative model for solving high-dimensional uncertainty quantification and uncertainty propagation problems. The deep learning architecture was developed by integrating the UNet architecture with a Gaussian gated linear network (GGLN). The network architecture of GLU-net was relatively simple and had 44% fewer parameters than existing work. GLU-net was indicated to be accurate and extremely efficient even without providing information about the input structure.¹⁵ Tsai et al. performed high-precision hand segmentation while reducing parameters and increasing inference speed. A new hand segmentation technique, refined UNet, was then proposed, which was based on the original UNet. Substantial performance improvement was achieved by refining the segmentation results and reducing the gap in the feature vectors. Experimental results indicated that in inference time, refined UNet was able to prune the refinement block to increase the speed and reduce the number of parameters. Moreover, refined UNet was able to achieve 200 frames per second (FPS) on GPU (RTX 2080Ti).¹⁶ The combination of deep learning technology and industry was also a popular study field this year. Therefore, Nemani et al. studied the application of deep learning in the remaining life prediction of bearings. Based on the ISO standard to determine the fault threshold of bearings, a two-stage LSTM model was proposed for the extraction of fault feature signals of bearings, and a Gaussian layer was embedded in the LSTM model for parameter optimization. The experimental results showed that the model had a good bearing life prediction accuracy.¹⁷ Han et al. proposed a CNN-M2R network with multilayer fusion and multidimensional attention to improve vehicle detection performance in urban areas. The method used a multidimensional attention network to highlight target convergence and used a new hard-easy positive and negative sample balanced sampling strategy and a global balanced loss function to deal with spatial imbalance and target imbalance. The experimental results greatly improved the detection performance compared to SSD, LRTDet, RFCN, and DFPN.¹⁸ Tian et al. proposed a deep learning-based steel surface defect detector, DCC-CenterNet, aiming to achieve an optimal balance between speed and accuracy. First, the detector employed an expansive feature enhancement model to expand its receptive field. Second, a new centrality function was introduced to improve the accuracy of keypoint estimation. The experimental results showed that the detector was able to detect steel surface defects effectively and efficiently, solving the conflict between speed and accuracy.¹⁹

In summary, numerous automated image processing schemes have been developed by scholars for the restoration recognition problem. However, all of these schemes exhibit poor recognition accuracy of objects and are incapable of handling complex scenes and multiscale image issues. DTT, meanwhile, has the capability of conducting real-time simulation and prediction. When used in conjunction with the UNet segmentation model, a more efficient operating room can be achieved. This not only increases accuracy in restoration but also reduces maintenance costs to some extent. Additionally, it possesses a strong potential for application.

Design of an OR system incorporating IS modelling and DTT

In order to design an automated system suitable for performing OR, the study firstly designs the overall OR system based on DTT and plans the overall workflow of the machine. Afterwards, the coordinate matching function of machine vision is designed, and AM is introduced to construct the MR-UNet-OIS model in order to realize the automated recognition planning of virtual and real coordinates.

Overall design of DTT-based OR system

Trajectory programming is an important factor affecting the quality of robot repair. Moreover, due to its low efficiency and poor accuracy, manual tutorial programming cannot meet the complexity and variety of repair needs. Therefore, offline programming based on DTT has become widely used. It improves the operator’s programming environment and effectively increases the efficiency and accuracy of trajectory planning. The DTT based on offline programming is expressed as follows: first, the simulation system with the same real environment is built by three-dimensional modeling, and the matching of virtual and real coordinates of the operating objects is completed. Then, the motion trajectory is generated by the trajectory planning strategy, and the motion simulation of the robot arm is completed according to the robot kinematics. The motion attitude and collision situation are observed in the process of simulation, and the collision points, inaccessible points, and axial overruns are adjusted to ensure the reliability of the motion trajectory. During the simulation process, the motion attitude and collision situation are observed, and the collision point, unreachable point, and axis over-limit point are adjusted to ensure the reliability of the motion trajectory. Finally, the planned motion trajectory is transformed into the numerical control code of the robot arm motion, and the rear numerical control code controls the robot arm to complete the repair task. The OR system designed by the study is a comprehensive framework that includes two main aspects: hardware and software. In terms of hardware, the study performs hardware design and selection of the robot, vision sensor, and end-effector performance indices based on the overall program and workflow. On the software process side, the study calibrates the camera calibration and robot reference coordinates. The OR system is mainly composed of three core components: real-virtual coordinate matching, motion trajectory planning on the simulation side, and trajectory post-processing and operation on the physical side. The real-virtual coordinate matching module acquires image data of the target object using a vision sensor. Then, the coordinate prediction algorithm determines the object’s spatial coordinates, which are verified in the simulation environment. In the trajectory planning module at the simulation end, the repair trajectory is generated based on the 3D model of the target object and verified by simulation through kinematics. In the trajectory post-processing and physical operation module, the generated repair trajectory is converted into CNC code to control the precise movement of the robot arm. In order to repair the object in all directions, the study adopts a two-machine cooperative strategy to ensure the accurate completion of the repair task. With DTT, the OR system achieves a high degree of integration and automation between the components and is able to provide an efficient and accurate solution for OR.

The overall framework of the OR system is shown in Figure 1. Image data of the target object is first acquired using the robot arm’s charge-coupled device (CCD). These data are then used in the real-virtual coordinate matching module where the actual spatial coordinates of the object are determined by a coordinate prediction algorithm. This step is the starting point of the entire OR process and provides the necessary input information for the subsequent motion planning on the simulation side and the operation on the physical side. After coordinate verification, the process enters the motion trajectory planning module on the simulation side. This module generates the repair trajectory based on the previously obtained 3D model information of the object. After the trajectory generation is completed, simulation verification is performed by the kinematics principle to ensure the feasibility of the trajectory. After simulation verification is complete, the process moves to the trajectory post and physical end operation module. At this stage, the repair trajectories generated in the simulation end are converted into CNC codes. These codes are then used in the physical end to guide the movement of the robot arm. The movement of the robotic arm is controlled by a dual-machine coordination strategy to ensure that the repair task is performed according to a predetermined trajectory. The entire OR system is highly integrated through DTT, and the modules are connected through clear logic and data flow. The system hardware consists of five main parts, namely, the repair platform, the repair object, the host computer, the robot and end-effector, and the camera. The system adopts a dual robotic arm design mode, which can perform single-machine repair and two-machine collaborative repair. Considering the accuracy requirements of the OR system, the pixel depth is 8 bit, the resolution needs to be greater than or equal to 1280*960, and in order to get high real-time, the camera frame rate needs to be greater than 20 fps, so the camera uses the MER2-23 1-41U 3M/C.

Figure 1.

Overall framework of object restoration system based on digital twin idea.

The robotic arm used in the surgical system is shown in Figure 2. In this study, GSK RB06-900 and KUKA KR3-R540 robots are used as the operating hardware of the repair system. In the OR system, the industrial robot serves as the main execution device, and its performance affects the speed and accuracy of the repair. Key performance parameters include the robot’s degrees of freedom, which indicate movement flexibility and task complexity. Maximum range of motion is affected by the end-effector. Payload is selected based on the weight of the repair object and torch. Motion speed describes point-to-point movement speed. The repeatability relates to task accuracy and stability. The key to the design of the end-effector is to ensure stable gripping of the object. The study adopts a pneumatic gripper with a flared jaw shape to increase the error tolerance. Inside the jaws, there are conical cylinders and trapezoidal shapes for object limitation to ensure the stability of the object during gripping and rotation. The opening and closing of the gripper are controlled by an OTS-750 air pump with a pressure of 7.0 bar and an air flow of 60 L/min.

Figure 2.

Model of large complex target repair robot.

In DT’s robotic OR system, the stability and intelligence of the software framework are key, and its software flow is shown in Figure 3. The framework consists of three main modules: the human-computer interaction interface of the offline programming simulation end of the robot, the virtual-real coordinate matching module of machine vision, and the communication module between the offline programming simulation end of the robot and the industrial robot. The human-machine interface is responsible for the import and display of 3D CAD models, the generation and adjustment of repair trajectories, and the creation and simulation of robot kinematic models. The virtual-real coordinate matching module involves CCD image acquisition, position information prediction, and coordinate matching strategy. The communication module provides a variety of control options such as continuous operation, single-step commissioning operation, and dual-machine co-operation operation and completes the encapsulation of communication packets and data transmission. The software is developed based on the VS2019 platform and uses MFC, OpenGL, and Open CASCADE to implement the graphical display interface. MFC provides a convenient GUI development environment, while OpenGL and Open CASCADE are used for modelling and displaying 3D CAD models. The system builds offline programming software through C++, Net, and OpenGL graphic libraries. On the simulation side, the user can import the CAD models of the robot, repair objects and end-effector, and set the relevant parameters. The system adopts the real-virtual coordinate intelligent matching method to complete the coordinate correspondence between the real objects and the simulated objects. The trajectory generation algorithm is responsible for the motion trajectory planning, while the D-H method is used to establish the kinematic equations of the robot. The repair trajectory is finally converted to CNC code and sent to the real robot via socket communication to complete the repair task.

Figure 3.

Large and complex target repair process based on the robot repair system.

Multiscale UNet IS model design based on residuals and AM

To realize a robotic OR system for DT, it is first necessary to ensure that the simulated coordinates of the objects match the real coordinates without any difference, so the study designs an intelligent matching method of virtual and real coordinates based on machine vision. To reduce the virtual-real coordinate matching error, the study introduces AM to optimize the MR-UNet-OIS model. The UNet model is a deep learning architecture for IS, originally proposed by Olaf Ronneberger et al. in 2015, with a focus on biomedical IS. The model has an encoder-decoder structure, where the encoder is responsible for capturing contextual information from the input image, while the decoder is used to finely localize and recover details of the image. The two are closely connected by skip links for multiscale feature fusion. The structure of the proposed multiscale UNet IS model based on residuals and AM is shown in Figure 4. Multiscale UNet IS model based on residuals with AM has left-right symmetric feature extraction network and segmentation prediction network. The feature extraction network uses the cross-stage-partial-block (CSP Block) residual module to effectively reduce the number of parameters and obtain high-quality contextual features. In the segmentation prediction phase, the model uses jump connections to fuse features at different scales to solve the feature loss problem caused by the maximum pooling layer. At the same time, an attention module is introduced to enhance the representation of features in terms of space and location. Finally, the segmented image is output by a single-layer convolutional network.

Figure 4.

Schematic diagram of multiscale UNet image segmentation model structure based on residual and attention mechanism.

The traditional UNet uses multi-layer small convolutional kernels continuously stacked to extract multiple depth feature mappings of the input image, but too much parameter redundancy reduces the feature acquisition efficiency, so the study designed the CSP Block multiscale residual module, whose structure is shown in Figure 5. CSP Block multiscale residual module to reduce the computational loss and retain the object multiscale features through multilevel residual connection. The study selects CSP Block as the core residual module due to its advantages in computational efficiency, feature reuse capability, and gradient vanishing avoidance. These advantages are particularly well-suited to the real-time and parameter sensitivity requirements of this task. CSP Block significantly reduces parameters and computations while maintaining or improving the model’s feature extraction capability. This makes it an ideal choice for balancing performance and efficiency requirements in this study. The feature extraction network first uses a 3*3 convolutional kernel to obtain the coded features $P_{1}$ and $F_{1}$ . The coded features are passed through 3 CSP block modules to output the features $F_{2}$ , $F_{3}$ , and $F_{4}$ . The $j$ th CSP Block inputs the coded features $P_{j}$ size $C_{j} \times H_{j} \times W_{j}$ into the channel for segmentation. The computational equation is shown in equation (1).²⁰

P_{A j} = ς (C o v (P_{j}), 1)

(1)

Figure 5.

Multiscale residual modules in the CSP structure of the UNet network.

As shown in equation (1). $C o v$ represents the convolution operation and $ς$ is the channel splitting to obtain the target part of the features. Acquiring the split feature $P_{A j}$ can effectively reduce the required parameters, after which the directly connected feature $P_{B j}$ is output by the convolution operation, and the computational equation is shown in (2).

P_{B j} = C o v (P_{j})

(2)

The directly connected features effectively preserve the initial input features. The $P_{A j}$ that completes the first level residual stage calculates the second level residual through equation (3).

P_{m j} = C o v (C o v (P_{A j})) \oplus C o v (P_{A j})

(3)

In equation (3), $P_{m j}$ denotes the output features of the secondary residuals and $\oplus$ denotes the channel fusion operation. Moreover, this process multiscale feature $F_{j}$ is obtained by the calculation of equation (4).

F_{j} = C o v (P_{m j}) \oplus P_{B j}

(4)

In equation (4), $P_{m j}$ and $P_{B j}$ are fused to obtain $F_{j}$ . After obtaining the object feature $F_{j}$ , the appropriate segmentation prediction network design becomes the key factor to achieve accurate IS. To solve the problem of jump connection between feature extraction network and segmentation prediction network, the up-sampled features at each stage are defined as $G_{1}$ to $G_{4}$ .

Existing methods reduce information loss by fusing $G_{i}$ with the same dimensionality of $F_{i}$ , which may introduce interfering information. Therefore, the study introduces the attention module, as shown in Figure 6, to enhance the spatial and positional representation of the fused feature $S_{i}$ and effectively eliminate interference. The initial feature map $S_{i}$ is of dimension $C_{1 i} \times H_{1 i} \times W_{1 i}$ , and the channel compression is performed by a single-layer 3*3 convolutional network to reduce the computational loss of the attention module to obtain the compressed $T_{i}$ . After that the feature map is encoded in horizontal and vertical dimensions, respectively, and the encoded features $T_{W_{1 i}}$ and $T_{H_{1 i}}$ are extracted by a pooling operation in the direction of vertical coordinates $(H, 1)$ and horizontal coordinates $(1, W)$ in channel $C_{1 i}$ .^21-23 The coding expression for the vertical coordinate h is shown in (5).

T_{W_{1 i}} (h) = \frac{1}{W_{1 i}} \sum_{n = 1}^{W_{1 i}} x_{c} (h, n)

(5)

Figure 6.

Improved attention modules in the UNet network structure.

In equation (5), $x_{c} (h, n)$ denotes the eigenvalue of point $(h, n)$ , in channel $c$ , and the horizontal coordinates are $w$ . The coded expression is given in (6).

T_{H_{1 i}} (w) = \frac{1}{H_{1 i}} \sum_{m = 1}^{H_{1 i}} x_{c} (m, w)

(6)

These encoded features are used to enhance the representation of the model in both vertical and horizontal directions. Afterwards, the coded features are integrated by channel fusion and convolution operations combined with the H swish activation function, and the integrated feature $T_{S i}$ is computed as shown in (7).

T_{S i} = σ (C o v (T_{w_{1 i}} \oplus T_{H 1 i}))

(7)

In equation (7), the integrated feature $T_{S i}$ output size is $D_{i} / r \times 1 \times (W_{1 i} + H_{1 i})$ , and $σ$ denotes the H swish activation function, which is shown in equation (8).

σ (x) = x \frac{\min (\max ((x + 3), 0), 6)}{6}

(8)

After that, the integration coefficients $a_{W_{1 i}}$ and $a_{H_{1 i}}$ were predicted for both vertical and horizontal dimensions based on the split of the integration feature $T_{S i}$ channel, as calculated in equation (9).

{\begin{cases} a_{H_{1 i}} = λ (C o v (ϕ (T_{S_{i}}, H))) \\ a_{W_{1 i}} = λ (C o v (ϕ (T_{S_{i}}, W))) \end{cases}

(9)

In equation (9), $ϕ$ denotes the computation of eigenvalues for dimensional splitting, and $λ$ denotes the Sigmoid activation function. $a_{W_{1 i}}$ and $a_{H_{1 i}}$ can denote the importance of positional and spatial information, which in turn predicts the final output feature $A_{i}$ , which is computed in equation (10).

A_{i} = T_{i} \otimes a_{H_{1 i}} \otimes a_{W_{1 i}}

(10)

The segmentation prediction network obtains an accurate description of the object segmentation information after four up-sampling operations as well as four attention modules $A_{4}$ . The segmented image is predicted by a 1*1 convolution and the size of the segmented image is the same as the size of the input image.

Performance evaluation and results

To verify the model’s ability to generalize, this study combines a publicly available dataset with a dataset built for repairing objects. The public dataset is the handbag open dataset. The open dataset for handbags is a synthetic handbag segmentation dataset derived from the FreiHAND hand pose dataset (https://lmb.informatik.uni-freiburg.de/resources/datasets/FreihandDataset.en.html). This dataset contains 550 RGB images and pixel-level annotation masks. The self-inspection dataset is captured using an MER2-231-41U3M/C industrial camera and includes five typical restoration objects: ceramic cups, screws, cross pipes, terracotta figurine heads, and turbine blades. A total of 300 high-resolution images are taken, including samples with differences in shape, size, and texture. The LabelMe tool is used for pixel-level semantic segmentation and annotation. An example of some images in the dataset is shown in Figure 7.

Figure 7.

Example of some images in the experimental dataset (source: https://lmb.informatik.uni-freiburg.de/resources/datasets/FreihandDataset.en.html).

The repair trajectory planned on the simulation end must be transferred to the real robotic arm and controlled to complete the corresponding repair movement. The study prefers to convert the planned repair trajectory into the numerical control code of the robotic arm. Then the data interoperability between the simulation end of offline programming and the industrial robot is completed. Finally, the repair task of the digital twin is complete. Accurate information system (IS) is crucial to ensure optimal OR system performance. However, the restoration industry lacks open datasets, prompting this study to utilize both the handbag open dataset and a self-constructed restoration object dataset to demonstrate the feasibility of the proposed segmentation method. The handbag dataset, comprising of 550 RGB color maps and corresponding annotations, presents a complex background and diverse types. From the dataset, 500 maps are used for training, and 50 were reserved for testing purposes. The self-designed database comprises five restoration objects, namely, ceramic cups, screws, intersecting pipes, terracotta warrior heads, and impellers, consisting of a cumulative total of 300 image samples. 270 of these samples are employed for training purposes, whereas the remaining 30 were utilized for testing. A diverse range of data enhancement techniques such as cropping, rotation, and zooming were adopted to facilitate the model’s comprehensive training. The accuracy of the model’s training results and the trajectory of loss value shifts are illustrated in Figure 8. Model training tests were conducted using UNet, Res-Net, and SegNet as comparison objects, and the overall comparison shows that the overall direction of the model is consistent, with a rapid increase in accuracy and a sharp decrease in loss in 0–10 iterations, and a slowing down of the increase in accuracy and decrease in loss in 20–40 iterations, which basically reaches a stable level after 80 iterations. Among them, the study-proposed MR-UNet model with fused AM has the best performance with an accuracy of 0.942 and a loss value of 0.099 at steady time, and compared with the UNet model, the accuracy has been improved by 0.09 and the loss value has been reduced by 0.011.

Figure 8.

Comparative test of model training.

In order to validate the excellent performance of the study-proposed IS model, the mean intersection-over-union (mIOU), a commonly used evaluation metric in the field of IS, is selected for comparative testing firstly. The joint average intersection over union (mIOU) is a quantitative measure of segmentation accuracy that calculates the pixel overlap between the predicted and ground truth masks, as shown in equation (11).

I O U_{c} = \frac{T P_{c}}{T P_{c} + F P_{c} + F N_{c}}

(11)

As a commonly used evaluation metric in the IS field, mIOU is firstly chosen to conduct a comparative test of the study-proposed multiscale residual UNet (MR-UNet) Model, and the test results are shown in Table 1. The mIOU of the traditional UNet network on the self-test dataset can reach 0.972, while the number of parameters is much higher than that of other models. After CSP Block improvement, the model mIOU will drop to 0.943, while the number of parameters will drop to 9.99 M. After improving the CSP Block + UNet model with the attention module, the mIOU on the self-built dataset can be restored to the traditional UNet level, while keeping the number of model parameters low. The experimental results prove that the improvement of the UNet model is effective and can effectively reduce the number of model parameters without reducing the mIOU. The research design model performs well in terms of accuracy on self-built datasets, while maintaining a high level of recall.

Table 1.

Results of the mIOU comparison of the different image segmentation models.

Comparison algorithm	mIOU	mIOU	Parameters	Precision	Recall
Comparison algorithm	Self-built datasets	Packet lifting dataset	Parameters	Self-built datasets	Packet lifting dataset
UNet	0.972	0.906	31.03 M	0.916	0.836
Res-Net	0.979	0.909	25.94 M	0.921	0.838
SegNet	0.979	0.921	29.44 M	0.914	0.841
L-UNet	0.989	0.917	22.28 M	0.928	0.842
CSP Block + UNet	0.943	0.878	9.99 M	0.916	0.861
CSP Block + attention + UNet	0.973	0.903	10.01 M	0.911	0.849
DeepLabV3+	0.983	0.901	16.78 M	0.921	0.858
Mask R-CNN	0.976	0.903	15.26 M	0.919	0.846

A single image from the dataset was selected for comparison testing and the visualization results are shown in Figure 9. Single images of multiple objects with significant differences in morphology, color, and size are selected for IS testing. Compared with UNet, the proposed method of the study exhibits similar segmentation performance with slightly higher sharpness and clearer contours. Based on the above results, the MR-UNet model is tested for the performance of real-virtual coordinate matching.

Figure 9.

Comparison of image segmentation recognition results.

The basis of real-virtual matching is to obtain the segmented image accurately and then use the minimum rectangle to obtain the object center coordinates to achieve real-virtual matching. The object is placed on a 600 × 600 restoration platform, and the input image size is set to 640 × 640. The results of several experiments are shown in Figure 10. It shows the results of comparing the position of the center point of the terracotta warriors, pipelines, leather bags, and other objects. Moreover, the overall comparison shows that the error between the predicted and actual object center coordinates is not more than 1 mm in both the horizontal and vertical directions, which is in line with the requirements of most industrial applications, and verifies the validity of the matching algorithm of the study. After the scene is built, the DT error is first reduced by the virtual and real coordinate matching method, and then the repair trajectory planning is carried out.

Figure 10.

Repeated test of virtual-reality matching.

The trajectory planning for the OR system is shown in Figure 11. The trajectory completely covers the pipe, and since the degrees of freedom of a single robot are not sufficient for the intersecting pipe welding task, a two-robot collaborative approach is used: the KUKA robot and the GSK robot each receive and send messages at predetermined points on the collaborative trajectory. The GSK robot grips and moves the repair object while the KUKA robot plans the weld path. This process is repeated until all repair trajectories have been completed. Finally, both robots return to the home position and complete the repair task. After the simulation is complete, the robots’ DT is realized by posting the trajectories and transferring them to the real robots, at which point the real-time animation of the simulation is played on the simulation side of the repair system, and the real robots complete the same repair task.

Figure 11.

OR system trajectory planning.

The trajectory planning of each joint of the robotic arm and the real motion trajectory are shown in Figure 12. Figure 12(a) shows the running trajectory of each joint in offline programming and simulation, and Figure 12(b) shows the real running trajectory synchronized with the simulation end. As illustrated in Figure 12, the actual movement trajectory of the repair robot is largely aligned with the planned trajectory, demonstrating a high degree of consistency. The robot repair system designed by the study demonstrates the capacity to adhere to the planned trajectory with precision, thereby substantiating the assertion that the study-designed robot repair system is capable of accurately executing the intended repair task.

Figure 12.

Comparison of trajectories of real and imaginary joints.

To evaluate the robustness of the model beyond the training distribution, cross-dataset validation is conducted. In cross-dataset validation, the training dataset is a sample from a self-built dataset, and the testing dataset is the public PASCAL VOC 2012 dataset (1449 images with segmentation masks). The cross-dataset validation results are shown in Table 2. Although VOC performance lags behind testing in specific fields, this model maintains competitive segmentation capabilities for geometrically similar objects, such as ceramic cups. However, performance degradation occurs in classes with different textures. This confirms that the model places structural features above appearance, enabling it to transfer to invisible industrial objects with similar shapes.

Table 2.

Cross-dataset validation results.

Metric	Self-built test	VOC 2012
mIOU	0.973	0.823
Pixel Acc	0.988	0.917

The system-level process of researching and designing models is shown in Figure 13. End to end data pipeline: The target image is segmented by MR UNet. The centroid of objects in the segmentation mask drives the alignment of virtual and real coordinates. Verified coordinates generate collision-free paths in digital twins. These paths are then converted into robot-specific CNC code for physical execution.

Figure 13.

The data flow path from image input to segmentation, virtual matching, and robot execution.

Results and analysis

In this section, the comprehensive performance of the proposed multiscale residual UNet image segmentation model (MR UNet) and digital twin organism restoration system was evaluated from three dimensions: model accuracy, virtual matching accuracy, and system restoration efficiency. The MR UNet model with an integrated AM achieved an mIoU of 0.973 on a self-built dataset of repaired objects with only 10.01 million parameters. This was significantly better than traditional models compared to the original UNet. The CSP Block structure reduced parameters by 67.8%. The AM module only increased the parameter by 0.02 M to restore the segmentation accuracy. The MR UNet produced clearer segmentation contours in complex areas, such as the texture of terracotta figurine heads and pipeline edges. This demonstrated that residual connections and AM effectively prevent information loss caused by pooling. Although mIOU was slightly lower than ResNet on the public handbag dataset, its parameter count was only 38.6% of the latter, confirming the practicality of the model in limited computing power scenarios.

Based on the segmentation results of MR UNet, the system extracted the center coordinates of the object through the minimum bounding rectangle. The predicted center coordinates of objects, such as terracotta figurines, pipes, and leather bags, differed from the actual values by less than 1 mm in the X/Y direction. This met the requirements for industrial-precision repairs. The main sources of error were CCD lens distortion (incomplete correction of MER2-231-41U3M/C camera) and small fluctuations in the boundaries of irregular objects in the segmentation model. The single matching took about 2.3 s, which limited the repair efficiency and requires optimization of algorithm parallelism.

The application of dual robot collaboration strategy significantly improved the processing capability of complex tasks. The dual machine collaboration successfully planned and executed full path welding of cross pipelines, which could not be completed by a single robot due to insufficient degrees of freedom. The overlap between the real joint trajectory and the offline planned trajectory exceeded 98%, verifying the accuracy of the dynamic simulation of the digital twin model. The typical complex target task time was shortened to<15 s, which was 40% higher than single machine operation, and the repair pass rate reached 99.2%. The study further compared the performance of single robots and dual robots, and the results are shown in Table 3. Although dual robot coordination introduced a 7.9% delay increase due to communication overhead between robots, it achieved complete six-degree-of-freedom path coverage, which was impossible for a single robot to achieve. The MER2-231-41U3M/C camera provided 1280 × 960 images at a speed of 20 FPS, with an additional 48 milliseconds delay per frame for MR UNet segmentation. Dual robot coordination reduced complex repair time by 43.7% while increasing the success rate to 99.2%. It is acceptable to achieve a path coverage rate of 98.5% and a 6.3% frame rate reduction, which cannot be done by a single robot. Due to synchronous inspection, the delay has slightly increased, but it can prevent collisions in enclosed spaces.

Table 3.

Dual-robot and single-robot performance.

Metric	Single-robot	Dual-robot	Improvement
Average repair time	25.4 ± 3.2 s	14.3 ± 1.8 s	43.7% ↓
Task success rate	84.6%	99.2%	14.6% ↑
Path coverage	71.3%	98.5%	27.2% ↑
Collision avoidance	78%	100%	22% ↑
Frame rate (FPS)	22.1	20.7	6.3% ↓
End-to-end latency	3.8 ± 0.4 s	4.1 ± 0.5 s	7.9% ↑

Deployment feasibility and limitations

The deployment of industrial repair systems faces three integration challenges. The first is the compatibility of the various communication protocols. The robotic arm (KUKA/GSK), camera (MER2-231-41U3M/C), and upper computer need to convert EtherCAT/CAN bus protocol through ROS-I middleware. An additional 23% of adaptation time is required due to the closed real-time data interface of GSK robots during development. Second, there is a conflict in computing resource allocation, where virtual matching and trajectory planning compete for CPU resources during synchronous operation, resulting in a peak load of 92%. Finally, the CPU-GPU task offloading strategy is adopted to forcibly bind the segmentation computation to NVIDIA RTX 3090. Finally, there is a compromise on environmental robustness. Electromagnetic interference at industrial sites causes a packet loss rate of up to 5.2% in camera-robot communication. This rate is reduced to 0.3% through the deployment of redundant TCP retransmission mechanisms and fiber-optic isolation, though it increases end-to-end latency. To meet the ISO 10218-1/2 industrial robot safety specification, the system integrates three layers of protection. First, there is hardware emergency stop, which is achieved through a safety relay connected in series with a dual robot enable circuit. Second, dynamic obstacle avoidance is achieved through 2D LiDAR scanning and real-time braking of virtual fences. Finally, force control rollback is achieved through impedance control triggered by a six -axis force sensor. The limitations of research and design systems can be divided into three aspects: real-time bottlenecks, environmental sensitivity, and cost constraints. In terms of real-time bottlenecks, it takes 2.3 s for virtual coordinate matching, which limits the use of high-speed production lines. In terms of environmental sensitivity, the segmentation accuracy of dust/strong light scenes has decreased by 37.6%, relying on active light source compensation. Due to cost constraints, the hardware cost of the dual robot system is 2.8 times higher than that of a single machine, which makes it difficult for small and medium-sized enterprises to deploy. Future research will use an FPGA-deployed SVD algorithm to accelerate target virtual matching by five times. This research will also integrate a millimeter wave radar and an RGB-D camera and develop a solution with a single master robot and an auxiliary positioning fixture.

Conclusion

To address the issue of under performance by traditional OR techniques in handling large-scale or complex object image data, this study is developed around the comprehensive design of an OR system based on DTT. Subsequently, AM is implemented to construct the MR-UNet-OIS model, which enables more precise localization of repair objects and also solves the problem of OR trajectory planning. The results of the performance test indicated that the proposed MR-UNet model for fused AM achieved the highest accuracy with a value of 0.942 and the lowest loss value at 0.099 in steady state. This model improved the accuracy by 0.09 and reduced the loss value by 0.011 as compared to the UNet model. In comparison to the conventional UNet model, the studied model demonstrated an improvement in prediction accuracy on the self-constructed dataset, with an increase of 0.01, while reducing the number of parameters by 20.02 M. The inclusion of the attention module resulted in an improvement of the prediction accuracy by only 0.02 M parameters. The proposed method showed similar segmentation performance as the UNet, albeit with greater clarity and sharper contours. The positioning of objects, including pipes and purses, showed a deviation of no more than 1 mm in both vertical and horizontal directions between the predicted and actual coordinates by the OR system. Using the large complex target repair system, the repair task time could be shortened to less than 15 s, which effectively improved the target repair efficiency. The results of the experiment on OR system trajectory planning indicated that the pipeline was fully covered by the trajectory. Moreover, the study proposed the use of dual-robot synergy to achieve better completion of complex repair tasks. This enhanced the repair capability of the system and allowed for the completion of all-round repair tasks which cannot be completed by a single robot.

In conclusion, this study proposes the integration of an AM system to construct an MR-UNet-OIS model, with the objective of achieving accurate, realistic image matching and improving the accuracy and efficiency of repairs for large, complex targets. Meanwhile, the study design of the repair system can significantly improve the repair efficiency of large and complex targets. This model has great potential in the field of surgery. The target restoration method of research and design can realize the high-precision restoration of ancient buildings and inherit the national intangible cultural heritage of architecture. However, the research-designed model takes a long time to match the real-virtual coordinates of the repair target, which has a great limit on the repair efficiency of the target. Furthermore, the model size of this method is dependent upon the size of the virtual model that is constructed. Further improvements to the repair efficiency of large and complex target repair tasks may be achieved through enhancements to the virtual coordinate matching algorithm and reductions in model size.

Footnotes

ORCID iD

Bo Xie

Funding

The research is supported by Science and Technology Department of Sichuan Province, “The Deep Diving of Flood Dragon”, (No.2020JDKP0041).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Appendix

References

Fang

Luo

Zhao

, et al. ST-SIGMA: spatio-temporal semantics and interaction graph aggregation for multi-agent perception and trajectory forecasting. CAAI Trans Intell Technol 2022; 7(4): 744–757.

Elugbadebo

Orunsolu

Akinyele

, et al. An efficient and secured graphical authentication system. Acta inform Malays 2022; 6(1): 17–21.

Gui

Nainar

Ciontea

C-I

, et al. Automatic voltage regulation application for PV inverters in low-voltage distribution grids - a digital twin approach. Int J Electr Power Energy Syst 2023; 149(Jul): 22–34.

Bawallah

Odole

Ilugbo

, et al. Relationship between native intelligence (Ni) and artificial intelligence (Ai): an overview. Acta Informatica Malaysia 2024; 8(1): 19–21.

Azman Brahim Llaguno Garciaa Orellana Garciab

Estrada Sentic

. Thermal images pre-processing for early detection of breast cancer: a progressive review. Acta Inform Malays 2024; 8(1): 26–31.

Mubarak Suud

. An image processing approach for monitoring soil plowing based on drone RGB images. Big data Agr 2022; 5(1): 01–05.

Pimenov

da Silva

LRR

Ercetin

, et al. State-of-the-art review of applications of image processing techniques for tool condition monitoring on conventional machining processes. Int J Adv Manuf Technol 2024; 130(1): 57–85.

Nan

Del Ser

, et al. Large-kernel attention for 3D medical image segmentation. Cognit Comput 2024; 16(4): 2063–2077.

Huat Koh

Farhan

Yeung

KPC

, et al. Maintenance robot for remote assembly of protective jackets on live gas risers. Autom ConStruct 2023; 145(Jan.): 636–648.

10.

Shi

Olvera

Hamilton

, et al. AI-enabled robotic NDE for structural damage assessment and repair. Mater Eval 2021; 79(7): 739–751.

11.

Dai

Jiang

, et al. Developing a climbing robot for repairing cables of cable-stayed bridges. Autom ConStruct 2021; 129(Sep.): 807–826.

12.

Latsou

Farsi

Erkoyuncu

. Digital twin-enabled automated anomaly detection and bottleneck identification in complex manufacturing systems using a multi-agent approach. J Manuf Syst 2023; 67(3): 242–264.

13.

Bertoni

. Designing solutions with the product-service systems digital twin: what is now and what is next. Comput Ind 2022; 138(10): 29–36.

14.

Huang

Zhang

, et al. Low-latency federated learning and blockchain for edge association in digital twin empowered 6G networks. IEEE Trans Ind Inf 2021; 17(7): 5098–5107.

15.

Mendu

Chakraborty

. Gated linear model induced U-net for surrogate modeling and uncertainty quantification. Probab Eng Mech 2023; 72(Apr.): 421–426.

16.

Tsai

T-H

Huang

S-A

. Refined U-net: a new semantic technique on hand segmentation. Neurocomputing 2022; 495(Jul.21): 3–10.

17.

Nemani

Thelen

, et al. Ensembles of probabilistic LSTM predictors and correctors for bearing prognostics using industrial standards. Neurocomputing 2022; 491(Jun.28): 575–596.

18.

Han

Wang

. M∼2R-Net: deep network for arbitrary oriented vehicle detection in MiniSAR images. Eng Comput 2021; 38(7): 2969–2995.

19.

Tian

Jia

. DCC-CenterNet: a rapid detection method for steel surface defects. Measurement 2022; 18(7): 211–223.

20.

Wang

Zhang

Han

, et al. Digital twin aided adversarial transfer learning method for domain adaptation fault diagnosis. Reliab Eng Syst Saf 2023; 234(Jun): 22–31.

21.

Ellingsen

M’Hamdi

, et al. The integration of neural network and high throughput multi-scale simulation for establishing a digital twin for aluminium billet DC-casting. Mater Trans 2023; 64(2): 360–365.

22.

Tian

Zhang

, et al. Noise-robust machinery fault diagnosis based on self-attention mechanism in wavelet domain. Measurement 2023; 20(11): 27–39.

23.

Juan

Ortiz

Molina

, et al. Enhancing multimodal patterns in neuroimaging by siamese neural networks with self-attention mechanism. Int J Neural Syst 2023; 33(4): 823–836.