No Arabic abstract
Measuring the respiratory signal from a video based on body motion has been proposed and recently matured in products for video health monitoring. The core algorithm for this measurement is the estimation of tiny chest/abdominal motions induced by respiration, and the fundamental challenge is motion sensitivity. Though prior arts reported on the validation with real human subjects, there is no thorough/rigorous benchmark to quantify the sensitivities and boundary conditions of motion-based core respiratory algorithms that measure sub-pixel displacement between video frames. In this paper, we designed a setup with a fully-controllable physical phantom to investigate the essence of core algorithms, together with a mathematical model incorporating two motion estimation strategies and three spatial representations, leading to six algorithmic combinations for respiratory signal extraction. Their promises and limitations are discussed and clarified via the phantom benchmark. The insights gained in this paper are intended to improve the understanding and applications of camera-based respiration measurement in health monitoring.
In this work, we focus on using convolution neural networks (CNN) to perform object recognition on the event data. In object recognition, it is important for a neural network to be robust to the variations of the data during testing. For traditional cameras, translations are well handled because CNNs are naturally equivariant to translations. However, because event cameras record the change of light intensity of an image, the geometric shape of event volumes will not only depend on the objects but also on their relative motions with respect to the camera. The deformation of the events caused by motions causes the CNN to be less robust to unseen motions during inference. To address this problem, we would like to explore the equivariance property of CNNs, a well-studied area that demonstrates to produce predictable deformation of features under certain transformations of the input image.
This letter presents a novel approach to extract reliable dense and long-range motion trajectories of articulated human in a video sequence. Compared with existing approaches that emphasize temporal consistency of each tracked point, we also consider the spatial structure of tracked points on the articulated human. We treat points as a set of vertices, and build a triangle mesh to join them in image space. The problem of extracting long-range motion trajectories is changed to the issue of consistency of mesh evolution over time. First, self-occlusion is detected by a novel mesh-based method and an adaptive motion estimation method is proposed to initialize mesh between successive frames. Furthermore, we propose an iterative algorithm to efficiently adjust vertices of mesh for a physically plausible deformation, which can meet the local rigidity of mesh and silhouette constraints. Finally, we compare the proposed method with the state-of-the-art methods on a set of challenging sequences. Evaluations demonstrate that our method achieves favorable performance in terms of both accuracy and integrity of extracted trajectories.
We attempt to overcome the restriction of requiring a writing surface for handwriting recognition. In this study, we design a prototype of a stylus equipped with motion sensor, and utilizes gyroscopic and acceleration sensor reading to perform written letter classification using various deep learning techniques such as CNN and RNNs. We also explore various data augmentation techniques and their effects, reaching up to 86% accuracy.
Robust and accurate camera calibration is essential for 3D reconstruction in light microscopy under circular motion. Conventional methods require either accurate key point matching or precise segmentation of the axial-view images. Both remain challenging because specimens often exhibit transparency/translucency in a light microscope. To address those issues, we propose a probabilistic inference based method for the camera calibration that does not require sophisticated image pre-processing. Based on 3D projective geometry, our method assigns a probability on each of a range of voxels that cover the whole object. The probability indicates the likelihood of a voxel belonging to the object to be reconstructed. Our method maximizes a joint probability that distinguishes the object from the background. Experimental results show that the proposed method can accurately recover camera configurations in both light microscopy and natural scene imaging. Furthermore, the method can be used to produce high-fidelity 3D reconstructions and accurate 3D measurements.
Understanding ego-motion and surrounding vehicle state is essential to enable automated driving and advanced driving assistance technologies. Typical approaches to solve this problem use fusion of multiple sensors such as LiDAR, camera, and radar to recognize surrounding vehicle state, including position, velocity, and orientation. Such sensing modalities are overly complex and costly for production of personal use vehicles. In this paper, we propose a novel machine learning method to estimate ego-motion and surrounding vehicle state using a single monocular camera. Our approach is based on a combination of three deep neural networks to estimate the 3D vehicle bounding box, depth, and optical flow from a sequence of images. The main contribution of this paper is a new framework and algorithm that integrates these three networks in order to estimate the ego-motion and surrounding vehicle state. To realize more accurate 3D position estimation, we address ground plane correction in real-time. The efficacy of the proposed method is demonstrated through experimental evaluations that compare our results to ground truth data available from other sensors including Can-Bus and LiDAR.