No Arabic abstract
In this paper, we present an approach to reconstruct 3-D human motion from multi-cameras and track human skeleton using the reconstructed human 3-D point (voxel) cloud. We use an improved and more robust algorithm, probabilistic shape from silhouette to reconstruct human voxel. In addition, the annealed particle filter is applied for tracking, where the measurement is computed using the reprojection of reconstructed voxel. We use two different ways to accelerate the approach. For the CPU only acceleration, we leverage Intel TBB to speed up the hot spot of the computational overhead and reached an accelerating ratio of 3.5 on a 4-core CPU. Moreover, we implement an intensively paralleled version via GPU acceleration without TBB. Taking account all data transfer and computing time, the GPU version is about 400 times faster than the original CPU implementation, leading the approach to run at a real-time speed.
A considerable limitation of employing sparse voxels octrees (SVOs) as a model format for ray tracing has been that the octree data structure is inherently static. Due to traversal algorithms dependence on the strict hierarchical structure of octrees, it has been challenging to achieve real-time performance of SVO model animation in ray tracing since the octree data structure would typically have to be regenerated every frame. Presented in this article is a novel method for animation of models specified on the SVO format. The method distinguishes itself by permitting model transformations such as rotation, translation, and anisotropic scaling, while preserving the hierarchical structure of SVO models so that they may be efficiently traversed. Due to its modest memory footprint and straightforward arithmetic operations, the method is well-suited for implementation in hardware. A software ray tracing implementation of animated SVO models demonstrates real-time performance on current-generation desktop GPUs, and shows that the animation method does not substantially slow down the rendering procedure compared to rendering static SVOs.
Estimating 3D poses of multiple humans in real-time is a classic but still challenging task in computer vision. Its major difficulty lies in the ambiguity in cross-view association of 2D poses and the huge state space when there are multiple people in multiple views. In this paper, we present a novel solution for multi-human 3D pose estimation from multiple calibrated camera views. It takes 2D poses in different camera coordinates as inputs and aims for the accurate 3D poses in the global coordinate. Unlike previous methods that associate 2D poses among all pairs of views from scratch at every frame, we exploit the temporal consistency in videos to match the 2D inputs with 3D poses directly in 3-space. More specifically, we propose to retain the 3D pose for each person and update them iteratively via the cross-view multi-human tracking. This novel formulation improves both accuracy and efficiency, as we demonstrated on widely-used public datasets. To further verify the scalability of our method, we propose a new large-scale multi-human dataset with 12 to 28 camera views. Without bells and whistles, our solution achieves 154 FPS on 12 cameras and 34 FPS on 28 cameras, indicating its ability to handle large-scale real-world applications. The proposed dataset is released at https://github.com/longcw/crossview_3d_pose_tracking.
Mesh reconstruction from a 3D point cloud is an important topic in the fields of computer graphic, computer vision, and multimedia analysis. In this paper, we propose a voxel structure-based mesh reconstruction framework. It provides the intrinsic metric to improve the accuracy of local region detection. Based on the detected local regions, an initial reconstructed mesh can be obtained. With the mesh optimization in our framework, the initial reconstructed mesh is optimized into an isotropic one with the important geometric features such as external and internal edges. The experimental results indicate that our framework shows great advantages over peer ones in terms of mesh quality, geometric feature keeping, and processing speed.
We show dense voxel embeddings learned via deep metric learning can be employed to produce a highly accurate segmentation of neurons from 3D electron microscopy images. A metric graph on a set of edges between voxels is constructed from the dense voxel embeddings generated by a convolutional network. Partitioning the metric graph with long-range edges as repulsive constraints yields an initial segmentation with high precision, with substantial accuracy gain for very thin objects. The convolutional embedding net is reused without any modification to agglomerate the systematic splits caused by complex self-contact motifs. Our proposed method achieves state-of-the-art accuracy on the challenging problem of 3D neuron reconstruction from the brain images acquired by serial section electron microscopy. Our alternative, object-centered representation could be more generally useful for other computational tasks in automated neural circuit reconstruction.
Marker-based and marker-less optical skeletal motion-capture methods use an outside-in arrangement of cameras placed around a scene, with viewpoints converging on the center. They often create discomfort by possibly needed marker suits, and their recording volume is severely restricted and often constrained to indoor scenes with controlled backgrounds. Alternative suit-based systems use several inertial measurement units or an exoskeleton to capture motion. This makes capturing independent of a confined volume, but requires substantial, often constraining, and hard to set up body instrumentation. We therefore propose a new method for real-time, marker-less and egocentric motion capture which estimates the full-body skeleton pose from a lightweight stereo pair of fisheye cameras that are attached to a helmet or virtual reality headset. It combines the strength of a new generative pose estimation framework for fisheye views with a ConvNet-based body-part detector trained on a large new dataset. Our inside-in method captures full-body motion in general indoor and outdoor scenes, and also crowded scenes with many people in close vicinity. The captured user can freely move around, which enables reconstruction of larger-scale activities and is particularly useful in virtual reality to freely roam and interact, while seeing the fully motion-captured virtual body.