Do you want to publish a course? Click here

Rigid and Articulated Point Registration with Expectation Conditional Maximization

78   0   0.0 ( 0 )
 Added by Radu Horaud P
 Publication date 2020
and research's language is English




Ask ChatGPT about the research

This paper addresses the issue of matching rigid and articulated shapes through probabilistic point registration. The problem is recast into a missing data framework where unknown correspondences are handled via mixture models. Adopting a maximum likelihood principle, we introduce an innovative EM-like algorithm, namely the Expectation Conditional Maximization for Point Registration (ECMPR) algorithm. The algorithm allows the use of general covariance matrices for the mixture model components and improves over the isotropic covariance case. We analyse in detail the associated consequences in terms of estimation of the registration parameters, and we propose an optimal method for estimating the rotational and translational parameters based on semi-definite positive relaxation. We extend rigid registration to articulated registration. Robustness is ensured by detecting and rejecting outliers through the addition of a uniform component to the Gaussian mixture model at hand. We provide an in-depth analysis of our method and we compare it both theoretically and experimentally with other robust methods for point registration.



rate research

Read More

Matching articulated shapes represented by voxel-sets reduces to maximal sub-graph isomorphism when each set is described by a weighted graph. Spectral graph theory can be used to map these graphs onto lower dimensional spaces and match shapes by aligning their embeddings in virtue of their invariance to change of pose. Classical graph isomorphism schemes relying on the ordering of the eigenvalues to align the eigenspaces fail when handling large data-sets or noisy data. We derive a new formulation that finds the best alignment between two congruent $K$-dimensional sets of points by selecting the best subset of eigenfunctions of the Laplacian matrix. The selection is done by matching eigenfunction signatures built with histograms, and the retained set provides a smart initialization for the alignment problem with a considerable impact on the overall performance. Dense shape matching casted into graph matching reduces then, to point registration of embeddings under orthogonal transformations; the registration is solved using the framework of unsupervised clustering and the EM algorithm. Maximal subset matching of non identical shapes is handled by defining an appropriate outlier class. Experimental results on challenging examples show how the algorithm naturally treats changes of topology, shape variations and different sampling densities.
75 - Yijia Weng , He Wang , Qiang Zhou 2021
In this work, we tackle the problem of category-level online pose tracking of objects from point cloud sequences. For the first time, we propose a unified framework that can handle 9DoF pose tracking for novel rigid object instances as well as per-part pose tracking for articulated objects from known categories. Here the 9DoF pose, comprising 6D pose and 3D size, is equivalent to a 3D amodal bounding box representation with free 6D pose. Given the depth point cloud at the current frame and the estimated pose from the last frame, our novel end-to-end pipeline learns to accurately update the pose. Our pipeline is composed of three modules: 1) a pose canonicalization module that normalizes the pose of the input depth point cloud; 2) RotationNet, a module that directly regresses small interframe delta rotations; and 3) CoordinateNet, a module that predicts the normalized coordinates and segmentation, enabling analytical computation of the 3D size and translation. Leveraging the small pose regime in the pose-canonicalized point clouds, our method integrates the best of both worlds by combining dense coordinate prediction and direct rotation regression, thus yielding an end-to-end differentiable pipeline optimized for 9DoF pose accuracy (without using non-differentiable RANSAC). Our extensive experiments demonstrate that our method achieves new state-of-the-art performance on category-level rigid object pose (NOCS-REAL275) and articulated object pose benchmarks (SAPIEN , BMVC) at the fastest FPS ~12.
Object detection when provided image-level labels instead of instance-level labels (i.e., bounding boxes) during training is an important problem in computer vision, since large scale image datasets with instance-level labels are extremely costly to obtain. In this paper, we address this challenging problem by developing an Expectation-Maximization (EM) based object detection method using deep convolutional neural networks (CNNs). Our method is applicable to both the weakly-supervised and semi-supervised settings. Extensive experiments on PASCAL VOC 2007 benchmark show that (1) in the weakly supervised setting, our method provides significant detection performance improvement over current state-of-the-art methods, (2) having access to a small number of strongly (instance-level) annotated images, our method can almost match the performace of the fully supervised Fast RCNN. We share our source code at https://github.com/ZiangYan/EM-WSD.
The rigid registration of two 3D point sets is a fundamental problem in computer vision. The current trend is to solve this problem globally using the BnB optimization framework. However, the existing global methods are slow for two main reasons: the computational complexity of BnB is exponential to the problem dimensionality (which is six for 3D rigid registration), and the bound evaluation used in BnB is inefficient. In this paper, we propose two techniques to address these problems. First, we introduce the idea of translation invariant vectors, which allows us to decompose the search of a 6D rigid transformation into a search of 3D rotation followed by a search of 3D translation, each of which is solved by a separate BnB algorithm. This transformation decomposition reduces the problem dimensionality of BnB algorithms and substantially improves its efficiency. Then, we propose a new data structure, named 3D Integral Volume, to accelerate the bound evaluation in both BnB algorithms. By combining these two techniques, we implement an efficient algorithm for rigid registration of 3D point sets. Extensive experiments on both synthetic and real data show that the proposed algorithm is three orders of magnitude faster than the existing state-of-the-art global methods.
Weakly-supervised action localization requires training a model to localize the action segments in the video given only video level action label. It can be solved under the Multiple Instance Learning (MIL) framework, where a bag (video) contains multiple instances (action segments). Since only the bags label is known, the main challenge is assigning which key instances within the bag to trigger the bags label. Most previous models use attention-based approaches applying attentions to generate the bags representation from instances, and then train it via the bags classification. These models, however, implicitly violate the MIL assumption that instances in negative bags should be uniformly negative. In this work, we explicitly model the key instances assignment as a hidden variable and adopt an Expectation-Maximization (EM) framework. We derive two pseudo-label generation schemes to model the E and M process and iteratively optimize the likelihood lower bound. We show that our EM-MIL approach more accurately models both the learning objective and the MIL assumptions. It achieves state-of-the-art performance on two standard benchmarks, THUMOS14 and ActivityNet1.2.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا