Calibration-free point cloud colorization for non-rigidly coupled LiDAR

Abstract

Non-rigidly coupled LiDAR-camera systems are increasingly adopted for 3D reconstruction and autonomous robots. However, point cloud colorization remains challenging for such systems due to the absence of fixed extrinsic parameters between sensors. This paper proposes a novel colorization framework for self-built non-rigid systems, which decomposes the colorization task into localized 3D-2D projective transformation estimation, mitigating the lack of fixed extrinsics. Each image then colors local 3D scene within its view frustum. When merging these scenes, overlapping regions yield multiple coloring candidates for individual 3D points. To resolve this, we introduce a coloring reliability assessment method that selects optimal coloring sources per point by evaluating projection geometry and reprojection errors. We conduct comprehensive experiments on self-collected outdoor real-world datasets, reporting the quantitative accuracy of 3D-2D transformation estimation and conducting qualitative analysis of colorized point clouds; we also perform ablation studies on the key module and cross-system validation on rigidly coupled systems, all of which collectively demonstrate the effectiveness and extensibility of the proposed method.

Keywords

point cloud colorization LiDAR camera non-rigidly coupled system extrinsic calibration

Introduction

LiDAR-Camera systems have been widely adopted in 3D reconstruction and autonomous robotic systems to produce colorized point cloud.¹ In a typical LiDAR-camera system, the components are rigidly connected. A pre-calibration step suffices to determine the fixed extrinsic parameters between the LiDAR and camera that define their relative spatial relationship.^2,3 These fixed parameters enable the mapping of each LiDAR point to a corresponding image pixel, thereby coloring the LiDAR points. Subsequently, techniques such as simultaneous localization and mapping (SLAM) are employed to accumulate these points into a global colorized point cloud.⁴

Non-rigidly coupled LiDAR-camera systems are also frequently encountered in practical systems, arising from a variety of real-world setups such as mechanical jitter in the mounting of originally rigidly coupled systems, integrated pan-tilt camera-LiDAR combinations, or the more common design illustrated in Figure 1 that employs a servo motor to drive LiDAR rotation. This servo-driven rotating setup, in comparison with rigidly mounted LiDAR sensors, enables a significant and flexible enhancement of the LiDAR’s perceptual coverage range, bringing notable perception gains for handheld devices and robotic systems.^5,6 However, due to the non-rigid coupling arising from the servo-driven rotational motion of the LiDAR, the extrinsic parameters between the LiDAR and the camera become time-varying, making it impossible to obtain fixed extrinsic parameters via a single pre-calibration step as in rigidly coupled systems. This not only presents a significant challenge for point cloud colorization,⁵ but also raises difficulties for many other key robotic tasks including SLAM,⁷ multi-sensor fusion and multi-modal perception.⁸

Figure 1.

The top image shows a self-built non-rigidly coupled LiDAR-camera system. The bottom image illustrates its point cloud colorization results of a street-level commercial building facade using the proposed framework.

To enable continuous estimation of system extrinsic parameters over time, instead of relying on a one-time pre-calibration, numerous studies have investigated online calibration approaches.^9–11 Such methods treat LiDAR-camera extrinsic parameters as optimizable variables for continuous iterative estimation and integrate them as a submodule into algorithms such as multi-sensor fusion SLAM, thereby improving the accuracy of extrinsic estimation and the entire system.¹² However, existing online calibration methods are still designed for rigidly coupled sensors. They can only handle minor extrinsic variations caused by factors like long-term mechanical mounting loosening, and are unable to adapt to the scenario of rapidly extrinsic variations induced by the servo-driven LiDAR rotation as addressed in this paper.

For systems such as servo-driven LiDAR setups, several studies leverage motor encoder angle data to measure the real-time rotational angles of the LiDAR, and thus treat such systems as rigidly coupled systems in turn.^13,14 They first transform the LiDAR point cloud into the motor’s coordinate system. This can establish fixed extrinsic parameters between the motor coordinate frame and the camera. Colors are then mapped to the point cloud in the motor frame and subsequently accumulated using SLAM. However, these approaches require high-precision synchronization among the LiDAR, the motor encoder, and the camera, which is challenging for self-built systems^13,15 and inapplicable to non-rigidly coupled systems without access to servo angle information.

To address these challenges, we propose a novel framework for point cloud colorization tasks specifically designed for non-rigidly coupled LiDAR-camera systems. Crucially, it enables colorization on any self-built system without requiring hardware calibration or data synchronization, greatly facilitating the design and implementation of autonomous robotic systems. It first accumulates LiDAR points into a global map using SLAM and segments this map into local scenes. For each scene and its corresponding image, an 3D-to-2D projective transformation is estimated to enable the colorization within the local scene. When merging scenes, LiDAR points potentially mapped to multiple images undergo a coloring reliability evaluation to select the optimal coloring source, maximizing overall colorization quality. The contributions of this work can be summarized as follows:

(1) A novel point cloud colorization framework is designed explicitly for non-rigidly coupled LiDAR-camera systems, enabling its application on any self-built hardware without requiring extrinsic calibration or data synchronization.

(2) A coloring reliability evaluation method is introduced to robustly resolve ambiguous color mappings by selecting the optimal image source for each LiDAR point, thereby ensuring high-quality and consistent colorization.

Methodology

As shown in Figure 2, our framework comprises three main modules. The preprocessing module ingests raw LiDAR scans and camera images, accumulates a global point cloud via LiDAR SLAM while estimating coarse camera poses for each image frame. It then partitions the global cloud into local point cloud scenes roughly aligned with corresponding image’s view frustum, forming image-point cloud pairs. Subsequently, the target-free 3D-to-2D transformation estimation module autonomously computes the transformation matrix for each pair using a state-of-the-art target-less extrinsic calibration method.³ Finally, the multi-view colored points merging module evaluates coloring reliability across all potential mappings for each LiDAR point and selects the optimal image source to produce a high-fidelity colorized point cloud.

Figure 2.

The point cloud colorization framework for non-rigidly coupled LiDAR-camera systems.

Notations

We denote the world, LiDAR, and camera coordinate frames as {W}, {L}, and {C}, respectively. Given the global point cloud set $P^{W}$ and the captured image sequence $I_{1}, . . ., I_{N}$ , our framework computes accurate camera poses $T_{C_{1}}^{W}, . . ., T_{C_{N}}^{W}$ for all images. For each point $p^{W} \in P^{W}$ , it identifies the optimal image $I$ for color projection. Then, the pixel $y$ in $I$ used to color $p^{W}$ is given by

y = π (T_{C}^{W^{- 1}} p^{W})

(1)

where

π : {C} \to R^{2}

represents the classical camera projection model, defined by the camera’s intrinsic parameters which are treated as known quantities in this work.

Preprocessing

This module resolves the absence of fixed extrinsic parameters by decomposing global point cloud colorization into independent local scene tasks. The global point cloud can be generated by any LiDAR SLAM algorithm (e.g., FAST-LIO2¹⁶ in our work). Critically, our framework bypasses gimbal coordinate transformations (infeasible for custom setups like Figure 1) by having SLAM directly compute the LiDAR-to-world transformation $T_{L}^{W}$ . This results in a time-varying relative pose $Δ T = T_{C}^{W} \cdot T_{L}^{W^{- 1}}$ between device poses. We therefore partition local scenes to solve $Δ T$ per camera viewpoint.

For each image $I_{i}$ , we crop a local point cloud $P_{i}^{W}$ from the global cloud $P^{W}$ that approximately falls within $I_{i}$ ’s view frustum, forming an image-point cloud pair ( $I_{i}$ , $P_{i}^{W}$ ). To initialize this process, the six-degree-of-freedom (6-DoF) pose $T_{C i}^{W}$ of $I_{i}$ in the world coordinate frame must first be roughly estimated. For typical non-rigidly coupled LiDAR-camera systems, the six pose components of the LiDAR and the camera exhibit negligible differences in most aspects. Taking the custom-built system in Figure 1 of this paper as an example, the 3-DoF position components and the pitch/yaw attitude components of the LiDAR and the camera are nearly identical, only the roll angle differs significantly and varies drastically over time, which is induced by the servo-driven rotational motion of the LiDAR. To maintain the simplicity of the framework, we still adopt the LiDAR poses obtained during mapping to roughly estimate most pose components of the camera. Specifically, for the timestamp $t_{C_{i}}$ of $I_{i}$ , we first identify the two closest LiDAR SLAM pose timestamps $t_{L_{k}}$ and $t_{L_{(k + 1)}}$ before and after $t_{C_{i}}$ . We then perform linear interpolation on the corresponding LiDAR poses $T_{L_{k}}^{W}$ and $T_{L_{(k + 1)}}^{W}$ to derive the 3-DoF position components and the pitch/yaw attitude components of the camera pose $T_{C i}^{W}$ . As for the roll angle component, it differs greatly from that of the LiDAR. However, the man-made or robotic carriers hosting the sensor system maintain a nearly horizontal attitude in most practical scenarios, with negligible variations in roll angle. For this reason, the roll angle of $T_{C i}^{W}$ is roughly estimated to be zero in this work. Combining this estimated pose $T_{C i}^{W}$ with camera field of view (FOV) parameters enables $P_{i}^{W}$ extraction within the valid viewing range.

Target-free 3D-2D transformation estimation

For each pair ( $I_{i}$ , $P_{i}^{W}$ ), we compute the relative pose $Δ T_{i}$ of the two sensors at time i. This yields the camera pose $T_{C i}^{W} = Δ T_{i} \cdot T_{L i}^{W}$ , enabling point coloring for the pair through Eq. 1. The computation of $Δ T_{i}$ can be viewed as performing extrinsic calibration between LiDAR and camera at time i. To enable environment-agnostic automation, we adopt a state-of-the-art target-less extrinsic calibration method³ for $Δ T_{i}$ estimation. This method follows a two-step optimization strategy. It first accumulates raw LiDAR point clouds to construct a dense scene representation and projects it into a 2D intensity image. For cross-modal correspondence estimation between the LiDAR intensity image and the camera image, it extracts keypoints via the SuperPoint detector,¹⁷ then infers reliable keypoint matches using the SuperGlue pipeline.¹⁸ An initial guess of the extrinsic parameters is subsequently obtained by minimizing the reprojection error of the matched keypoints. In the second step, the initial guess is refined through a direct LiDAR-camera registration approach that leverages the normalized information distance as the similarity metric. For our implementation, we directly employ this well-established calibration framework to estimate $Δ T_{i}$ , with only one modification: the point cloud accumulation step in the original method is omitted, and we instead use $P_{i}^{W}$ as the input.

While the target-free calibration approach estimates $Δ T_{i}$ automatically, its accuracy depends on the content captured by both sensors. We quantitatively evaluate this accuracy using 3D-2D reprojection error. It adopts a definition analogous to that in the applied calibration framework,³ as given by

R E = \frac{\sum_{j = 1}^{M} {‖ y_{j} - {\overset{⌢}{y}}_{j} ‖}^{2}}{M} = \frac{\sum_{j = 1}^{M} {‖ π (T_{C_{i}}^{W} {\overset{⌢}{p}}_{j}^{W}) - {\overset{⌢}{y}}_{j} ‖}^{2}}{M}

(2)

where M denotes the number of the LiDAR-visual feature point correspondences (

{\overset{⌢}{p}}_{j}^{W}

{\overset{⌢}{y}}_{j}

) identified by SuperGlue for the current scene, and

{\overset{⌢}{y}}_{j}

is the coordinate on the image plane derived by transforming the LiDAR point

{\overset{⌢}{p}}_{j}^{W}

via the final estimated extrinsic parameters. During subsequent fusion of multiple local scenes, this metric serves as the key criterion for selecting the optimal coloring image for individual LiDAR points.

Multi-view colored points merging

While individual coloring is performed per image-point cloud pair, significant overlaps between scenes may cause multiple images to color each LiDAR point. Selecting optimal coloring sources during point cloud merging thus becomes critical. We find that larger incidence angles and longer ranges between camera and object surfaces correlate with higher coloring errors. We therefore propose a quality assessment metric evaluating coloring reliability per point-image pair using

Q = w_{α} e^{- k_{α} α} + w_{d} e^{- k_{d} d} + w_{r} e^{- k_{r} (R E)}

(3)

where

w_{α}, w_{d}, w_{r}

(

w_{α} + w_{d} + w_{r} = 1

) are the weighting coefficients and

k_{α}, k_{d}, k_{r}

are the decay rate factors. The incidence angle

α

is defined as the angle between the viewing vector (i.e., the approximated camera orientation vector) and the local surface normal at the target point. The imaging range

d

corresponds to the distance between camera position and the target point. Additionally, the quality metric Q incorporates the reprojection error RE obtained during projection matrix optimization (Eq. 2). Consequently, the image with maximum Q is selected as the final coloring source for each point.

Experiments

Experimental setup

Due to the absence of public datasets for non-rigidly coupled LiDAR-camera systems, we collected ten outdoor data sequences (trajectory lengths: 20-100m) across Dalian Maritime University campus using the custom-built system shown in Figure 1. The hardware comprises a servo motor rotating an Ouster OS1-32 LiDAR, a rigidly mounted FLIR CM3-U3-13Y3C color camera capturing 1280×1024 images at 25 Hz, and an onboard computer (Intel i7-10510U CPU, 16GB RAM) for data logging and offline processing.

For the decay coefficients in Eq. 3, we select representative data sequences and calculate the value distribution of the three metric terms. Each coefficient is set as the reciprocal of the 95th percentile of its corresponding term to normalize different quantities into a consistent range [0,1]. The final coefficients are set as $k_{α} = 0.026$ , $k_{d} = 0.033$ , $k_{r} = 0.0012$ . The weighting coefficients are empirically set to $w_{α} = 0.3$ , $w_{d} = 0.2$ , $w_{r} = 0.5$ . In practical use, we find minor variations in these values generally exert negligible influence on the coloring results, and they are suitable for most common scenarios.

Results of 3D-to-2D transformation estimation

The accuracy of 3D-to-2D transformation estimation of each pair (

I_{i}

P_{i}^{W}

) critically determines point cloud coloring quality. We quantify this accuracy by computing deviations between automatically estimated transformations and manually calibrated references (using Koide et al.³). As shown in Table 1, statistical analysis of 15 randomly selected scenes reveals mean translation and rotation errors of 2.4 cm and 0.109° respectively. This accuracy is comparable to the automatic calibration precision for rigidly coupled LiDAR-camera systems reported in Koide et al.,³ and such inter-scene error variation is consistent with their experimental results as well, demonstrating the efficacy of the transformation estimation approach within our framework. It should be noted that the estimation accuracy is influenced by environmental content in the scenes. To compensate for this variability, our framework automatically filters out images exhibiting large reprojection errors based on Eq. 2. Statistical analysis shows these discarded images are typically characterized by extreme viewing angles or insufficient visual features.

Table 1.

Transformation estimation errors.

Scene index	Trans. (m)	Rot. (°)
1	0.006	0.035
2	0.003	0.015
3	0.113	0.316
4	0.011	0.063
5	0.005	0.022
6	0.021	0.106
7	0.029	0.157
8	0.039	0.154
9	0.009	0.028
10	0.021	0.101
11	0.023	0.127
12	0.003	0.012
13	0.077	0.455
14	0.005	0.025
15	0.005	0.024
Avg.	0.024	0.109

Results of point cloud colorization

Since acquiring ground-truth point-pixel correspondences remains challenging, quantitative evaluation of colorization accuracy relies on the indirect metrics presented in last section. Here we qualitatively validate results through rendered colorized point clouds. Figures 1 and 3 presents the colorization results of multiple structured architectural scenes and less-structured lawn and garden environments. The coloring results demonstrate realistic appearance consistency with actual environments.

Figure 3.

Point cloud colorization results. (a)-(c) are scene photographs and (d)-(f) are corresponding colored point clouds.

Ablation study on multi-view merging

We conduct an ablation study to validate the significance of the multi-view colored points merging module. For comparison, we implement a baseline framework without this module, where each point’s color is assigned solely by the image corresponding to its first projection instance during scene merging. Figure 4 contrasts representative local details from both methods, showing that the baseline suffers from obvious misalignment and ghosting artifacts (e.g., text boundaries), while our method presents clearer edges and a more faithful representation by selecting optimal coloring sources based on perspective, distance, and projection quality.

Figure 4.

Comparison of local details of the proposed framework with (left column) and without (right column) multi-view merging.

Comparison on rigidly coupled systems

We compare the proposed framework with the coloring scheme based on pre-calibration. Since traditional pre-calibration methods are not applicable to the colorization task of non-rigidly coupled LiDAR-camera systems focused in this paper, we conduct such a comparison on rigidly coupled systems, which can further verify the extensibility of the proposed method. Specifically, the servomotor of the device shown in Figure 1 is locked to form a conventional rigidly coupled system. Two outdoor data sequences of approximately 100 meters long are collected in the campus environment as test data. The same LiDAR SLAM method (i.e., FAST-LIO2) is adopted for map construction. For the baseline method, after performing manual extrinsic calibration using the state-of-the-art calibration method,³ the raw LiDAR points of each frame are directly colored online with the calibrated extrinsics, and the global colored point cloud is accumulated incrementally. Note that our experimental setup is not equipped with hardware synchronization; thus, the platform moves slowly during data acquisition to minimize the impact of time synchronization errors. In contrast, the proposed framework operates in an offline manner: the global map is first constructed, and then the colorization is completed following the pipeline shown in Figure 2.

Comparisons of the partial colorization results for the two data sequences are illustrated in Figure 5. It can be seen that both methods are capable of accomplishing the global colorization task. However, the baseline method yields inferior color details with blurry local regions, which is most likely due to the lack of precise hardware synchronization in the rigidly coupled system constructed in this work. This leads to fusion errors between LiDAR and image frames, and such errors accumulate at the same physical location from multi-frame fusion. In contrast, the proposed method achieves sharper local color details, thanks to its one-shot offline coloring based on image-scene pairs and its independence from precise hardware temporal synchronization. We also present the quantitative results using the same statistical approach as in Table 1, where Table 2 summarizes the errors of the estimated 3D-2D transformations over 15 randomly selected frames. It shows that the average error is also close to that in Table 2, which also verifies the applicability of the proposed method on different types of LiDAR-camera systems.

Figure 5.

Comparison results on the rigidly coupled system.

Table 2.

Transformation estimation errors on rigidly coupled system.

Scene index	Trans. (m)	Rot. (°)
1	0.032	0.225
2	0.020	0.122
3	0.049	0.105
4	0.059	0.112
5	0.013	0.080
6	0.050	0.038
7	0.032	0.031
8	0.045	0.037
9	0.014	0.193
10	0.008	0.128
11	0.064	0.061
12	0.005	0.121
13	0.058	0.214
14	0.018	0.068
15	0.004	0.040
Avg.	0.031	0.105

Conclusion

We propose a point cloud colorization framework for non-rigidly coupled LiDAR-camera systems. By decomposing the global coloring problem into localized 3D-to-2D transformation estimation tasks, our approach overcomes the absence of fixed extrinsic parameters. A novel coloring quality assessment method is introduced to resolve optimal image selection during multi-view colored points merging. Future work will focus on improving the framework’s robustness across diverse scenarios and validating it on more types of non-rigid systems. We also intend to develop quantitative evaluation benchmarks with ground truth for point cloud coloring and conduct more systematic analysis on parameter effectiveness, so as to further improve the practicality and objectivity of the method.

Footnotes

ORCID iDs

Guojian He

Shuaichen Dong

Qiufeng Cai

Yisha Liu

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (grant number 62303085).

Declaration of conflicting interest

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

All data relevant to this study are available upon request. Please contact the corresponding author to access the data.*

References

Erke

Bin

Yiming

, et al. A fast calibration approach for onboard LiDAR-camera systems. Int J Adv Robot Syst 2020; 17(2): 172988142090960. https://doi.org/10.1177/1729881420909606

Zhou

Zhang

, et al. Line-based targetless camera–LiDAR calibration via extrinsic initialization and multiconstraint driven mixture model. IEEE Trans Instrum Meas 2025; 74: 1–21. https://doi.org/10.1109/tim.2025.3577857

Koide

Oishi

Yokozuka

, et al. General, single-shot, target-less, and automatic LiDAR-camera extrinsic calibration toolbox. IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 11301–11307.

Zheng

Zou

, et al. FAST-LIVO2: fast direct LiDAR–inertial–visual odometry. IEEE Trans Robot 2025; 41: 326–346. https://doi.org/10.1109/tro.2024.3502198

Lowe

Kim

Cox

. Complementary perception for handheld SLAM. IEEE Robot Autom Lett 2018; 3(2): 1104–1111. https://doi.org/10.1109/lra.2018.2795651

Kottege

. Heterogeneous robot teams with unified perception and autonomy: how Team CSIRO Data61 tied for the top score at the DARPA subterranean challenge. Field Robot 2024; 4: 313–359.

Boche

Jung

Laina

, et al. OKVIS2-X: open keyframe-based visual-inertial SLAM configurable with dense depth or LiDAR, and GNSS. IEEE Trans Robot 2025; 41: 6064–6083. https://doi.org/10.1109/tro.2025.3619051

Wisth

Camurri

Das

, et al. Unified multi-modal landmark tracking for tightly coupled Lidar-visual-inertial odometry. IEEE Robot Autom Lett 2021; 6(2): 1004–1011. https://doi.org/10.1109/lra.2021.3056380

Huang

Zhang

Chen

, et al. Online, target-free LiDAR-camera extrinsic calibration via cross-modal mask matching. IEEE Trans Intell Veh 2025; 10(5): 3531–3542. https://doi.org/10.1109/tiv.2024.3456299

10.

Lee

Bong

E-J

Kee

S-C

. Motion-based camera-LiDAR online calibration with 3D camera ground. IEEE Trans Intell Veh 2025; 10(5): 3278–3290. https://doi.org/10.1109/tiv.2024.3451058

11.

Luan

Shi

Chen

, et al. RLCNet: A novel deep feature-matching-based method for online target-free radar-LiDAR calibration. IEEE International Conference on Robotics and Automation (ICRA), 2025, pp. 3146–3152.

12.

Ding

Quan

, et al. Survey of extrinsic calibration on LiDAR-camera system for intelligent vehicle: challenges, approaches, and trends. IEEE Trans Intell Transp Syst 2024; 25(11): 15342–15366. https://doi.org/10.1109/tits.2024.3419758

13.

Cui

Niu

, et al. αlidar: an adaptive high-resolution panoramic lidar system. Proceedings of the 30th Annual International Conference on Mobile Computing and Networking, 2024, pp. 1515–1529.

14.

Park

Moghadam

Kim

, et al. Spatiotemporal camera-LiDAR calibration: a targetless and structureless approach. IEEE Robot Autom Lett 2020; 5(2): 1556–1563. https://doi.org/10.1109/lra.2020.2969164

15.

Gurumadaiah

Park

Lee

, et al. Precise Synchronization Between LiDAR and Multiple Cameras for Autonomous Driving: An Adaptive Approach. IEEE Trans Intell Veh 2025; 10(3): 2152–2162. https://doi.org/10.1109/tiv.2024.3444780

16.

Cai

, et al. FAST-LIO2: fast direct LiDAR-inertial odometry. IEEE Trans Robot 2022; 38(4): 2053–2073. https://doi.org/10.1109/tro.2022.3141876

17.

DeTone

Malisiewicz

Rabinovich

. SuperPoint: Self-supervised interest point detection and description. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 224–236.

18.

Sarlin

P-E

DeTone

Malisiewicz

, et al. SuperGlue: learning feature matching with graph neural networks. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 4938–4947.

Calibration-free point cloud colorization for non-rigidly coupled LiDAR–camera systems

Abstract

Keywords

Introduction

Methodology

Notations

Preprocessing

Target-free 3D-2D transformation estimation

Multi-view colored points merging

Experiments

Experimental setup

Results of 3D-to-2D transformation estimation

Results of point cloud colorization

Ablation study on multi-view merging

Comparison on rigidly coupled systems

Conclusion

Footnotes

ORCID iDs

Funding

Declaration of conflicting interest

Data Availability Statement

References