No Arabic abstract
This paper focuses on developing efficient and robust evaluation metrics for RANSAC hypotheses to achieve accurate 3D rigid registration. Estimating six-degree-of-freedom (6-DoF) pose from feature correspondences remains a popular approach to 3D rigid registration, where random sample consensus (RANSAC) is a de-facto choice to this problem. However, existing metrics for RANSAC hypotheses are either time-consuming or sensitive to common nuisances, parameter variations, and different application scenarios, resulting in performance deterioration in overall registration accuracy and speed. We alleviate this problem by first analyzing the contributions of inliers and outliers, and then proposing several efficient and robust metrics with different designing motivations for RANSAC hypotheses. Comparative experiments on four standard datasets with different nuisances and application scenarios verify that the proposed metrics can significantly improve the registration performance and are more robust than several state-of-the-art competitors, making them good gifts to practical applications. This work also draws an interesting conclusion, i.e., not all inliers are equal while all outliers should be equal, which may shed new light on this research problem.
Imperfect data (noise, outliers and partial overlap) and high degrees of freedom make non-rigid registration a classical challenging problem in computer vision. Existing methods typically adopt the $ell_{p}$ type robust estimator to regularize the fitting and smoothness, and the proximal operator is used to solve the resulting non-smooth problem. However, the slow convergence of these algorithms limits its wide applications. In this paper, we propose a formulation for robust non-rigid registration based on a globally smooth robust estimator for data fitting and regularization, which can handle outliers and partial overlaps. We apply the majorization-minimization algorithm to the problem, which reduces each iteration to solving a simple least-squares problem with L-BFGS. Extensive experiments demonstrate the effectiveness of our method for non-rigid alignment between two shapes with outliers and partial overlap, with quantitative evaluation showing that it outperforms state-of-the-art methods in terms of registration accuracy and computational speed. The source code is available at https://github.com/Juyong/Fast_RNRR.
Fluoroscopy is the standard imaging modality used to guide hip surgery and is therefore a natural sensor for computer-assisted navigation. In order to efficiently solve the complex registration problems presented during navigation, human-assisted annotations of the intraoperative image are typically required. This manual initialization interferes with the surgical workflow and diminishes any advantages gained from navigation. We propose a method for fully automatic registration using annotations produced by a neural network. Neural networks are trained to simultaneously segment anatomy and identify landmarks in fluoroscopy. Training data is obtained using an intraoperatively incompatible 2D/3D registration of hip anatomy. Ground truth 2D labels are established using projected 3D annotations. Intraoperative registration couples an intensity-based strategy with annotations inferred by the network and requires no human assistance. Ground truth labels were obtained in 366 fluoroscopic images across 6 cadaveric specimens. In a leave-one-subject-out experiment, networks obtained mean dice coefficients for left and right hemipelves, left and right femurs of 0.86, 0.87, 0.90, and 0.84. The mean 2D landmark error was 5.0 mm. The pelvis was registered within 1 degree for 86% of the images when using the proposed intraoperative approach with an average runtime of 7 seconds. In comparison, an intensity-only approach without manual initialization, registered the pelvis to 1 degree in 18% of images. We have created the first accurately annotated, non-synthetic, dataset of hip fluoroscopy. By using these annotations as training data for neural networks, state of the art performance in fluoroscopic segmentation and landmark localization was achieved. Integrating these annotations allows for a robust, fully automatic, and efficient intraoperative registration during fluoroscopic navigation of the hip.
Image-based navigation is widely considered the next frontier of minimally invasive surgery. It is believed that image-based navigation will increase the access to reproducible, safe, and high-precision surgery as it may then be performed at acceptable costs and effort. This is because image-based techniques avoid the need of specialized equipment and seamlessly integrate with contemporary workflows. Further, it is expected that image-based navigation will play a major role in enabling mixed reality environments and autonomous, robotic workflows. A critical component of image guidance is 2D/3D registration, a technique to estimate the spatial relationships between 3D structures, e.g., volumetric imagery or tool models, and 2D images thereof, such as fluoroscopy or endoscopy. While image-based 2D/3D registration is a mature technique, its transition from the bench to the bedside has been restrained by well-known challenges, including brittleness of the optimization objective, hyperparameter selection, and initialization, difficulties around inconsistencies or multiple objects, and limited single-view performance. One reason these challenges persist today is that analytical solutions are likely inadequate considering the complexity, variability, and high-dimensionality of generic 2D/3D registration problems. The recent advent of machine learning-based approaches to imaging problems that, rather than specifying the desired functional mapping, approximate it using highly expressive parametric models holds promise for solving some of the notorious challenges in 2D/3D registration. In this manuscript, we review the impact of machine learning on 2D/3D registration to systematically summarize the recent advances made by introduction of this novel technology. Grounded in these insights, we then offer our perspective on the most pressing needs, significant open problems, and possible next steps.
The rigid registration of two 3D point sets is a fundamental problem in computer vision. The current trend is to solve this problem globally using the BnB optimization framework. However, the existing global methods are slow for two main reasons: the computational complexity of BnB is exponential to the problem dimensionality (which is six for 3D rigid registration), and the bound evaluation used in BnB is inefficient. In this paper, we propose two techniques to address these problems. First, we introduce the idea of translation invariant vectors, which allows us to decompose the search of a 6D rigid transformation into a search of 3D rotation followed by a search of 3D translation, each of which is solved by a separate BnB algorithm. This transformation decomposition reduces the problem dimensionality of BnB algorithms and substantially improves its efficiency. Then, we propose a new data structure, named 3D Integral Volume, to accelerate the bound evaluation in both BnB algorithms. By combining these two techniques, we implement an efficient algorithm for rigid registration of 3D point sets. Extensive experiments on both synthetic and real data show that the proposed algorithm is three orders of magnitude faster than the existing state-of-the-art global methods.
Detecting dynamic objects and predicting static road information such as drivable areas and ground heights are crucial for safe autonomous driving. Previous works studied each perception task separately, and lacked a collective quantitative analysis. In this work, we show that it is possible to perform all perception tasks via a simple and efficient multi-task network. Our proposed network, LidarMTL, takes raw LiDAR point cloud as inputs, and predicts six perception outputs for 3D object detection and road understanding. The network is based on an encoder-decoder architecture with 3D sparse convolution and deconvolution operations. Extensive experiments verify the proposed method with competitive accuracies compared to state-of-the-art object detectors and other task-specific networks. LidarMTL is also leveraged for online localization. Code and pre-trained model have been made available at https://github.com/frankfengdi/LidarMTL.