ترغب بنشر مسار تعليمي؟ اضغط هنا

Optical flow estimation with occlusion or large displacement is a problematic challenge due to the lost of corresponding pixels between consecutive frames. In this paper, we discover that the lost information is related to a large quantity of motion features (more than 40%) computed from the popular discriminative cost-volume feature would completely vanish due to invalid sampling, leading to the low efficiency of optical flow learning. We call this phenomenon the Vanishing Cost Volume Problem. Inspired by the fact that local motion tends to be highly consistent within a short temporal window, we propose a novel iterative Motion Feature Recovery (MFR) method to address the vanishing cost volume via modeling motion consistency across multiple frames. In each MFR iteration, invalid entries from original motion features are first determined based on the current flow. Then, an efficient network is designed to adaptively learn the motion correlation to recover invalid features for lost-information restoration. The final optical flow is then decoded from the recovered motion features. Experimental results on Sintel and KITTI show that our method achieves state-of-the-art performances. In fact, MFR currently ranks second on Sintel public website.
70 - Yang Jiao , Yi Niu , Trac D. Tran 2020
In 2D+3D facial expression recognition (FER), existing methods generate multi-view geometry maps to enhance the depth feature representation. However, this may introduce false estimations due to local plane fitting from incomplete point clouds. In th is paper, we propose a novel Map Generation technique from the viewpoint of information theory, to boost the slight 3D expression differences from strong personality variations. First, we examine the HDR depth data to extract the discriminative dynamic range $r_{dis}$, and maximize the entropy of $r_{dis}$ to a global optimum. Then, to prevent the large deformation caused by over-enhancement, we introduce a depth distortion constraint and reduce the complexity from $O(KN^2)$ to $O(KNtau)$. Furthermore, the constrained optimization is modeled as a $K$-edges maximum weight path problem in a directed acyclic graph, and we solve it efficiently via dynamic programming. Finally, we also design an efficient Facial Attention structure to automatically locate subtle discriminative facial parts for multi-scale learning, and train it with a proposed loss function $mathcal{L}_{FA}$ without any facial landmarks. Experimental results on different datasets show that the proposed method is effective and outperforms the state-of-the-art 2D+3D FER methods in both FER accuracy and the output entropy of the generated maps.
This paper addresses the challenging unsupervised scene flow estimation problem by jointly learning four low-level vision sub-tasks: optical flow $textbf{F}$, stereo-depth $textbf{D}$, camera pose $textbf{P}$ and motion segmentation $textbf{S}$. Our key insight is that the rigidity of the scene shares the same inherent geometrical structure with object movements and scene depth. Hence, rigidity from $textbf{S}$ can be inferred by jointly coupling $textbf{F}$, $textbf{D}$ and $textbf{P}$ to achieve more robust estimation. To this end, we propose a novel scene flow framework named EffiScene with efficient joint rigidity learning, going beyond the existing pipeline with independent auxiliary structures. In EffiScene, we first estimate optical flow and depth at the coarse level and then compute camera pose by Perspective-$n$-Points method. To jointly learn local rigidity, we design a novel Rigidity From Motion (RfM) layer with three principal components: emph{}{(i)} correlation extraction; emph{}{(ii)} boundary learning; and emph{}{(iii)} outlier exclusion. Final outputs are fused based on the rigid map $M_R$ from RfM at finer levels. To efficiently train EffiScene, two new losses $mathcal{L}_{bnd}$ and $mathcal{L}_{unc}$ are designed to prevent trivial solutions and to regularize the flow boundary discontinuity. Extensive experiments on scene flow benchmark KITTI show that our method is effective and significantly improves the state-of-the-art approaches for all sub-tasks, i.e. optical flow ($5.19 rightarrow 4.20$), depth estimation ($3.78 rightarrow 3.46$), visual odometry ($0.012 rightarrow 0.011$) and motion segmentation ($0.57 rightarrow 0.62$).
In order to reduce hardware complexity and power consumption, massive multiple-input multiple-output (MIMO) systems employ low-resolution analog-to-digital converters (ADCs) to acquire quantized measurements $boldsymbol y$. This poses new challenges to the channel estimation problem, and the sparse prior on the channel coefficient vector $boldsymbol x$ in the angle domain is often used to compensate for the information lost during quantization. By interpreting the sparse prior from a probabilistic perspective, we can assume $boldsymbol x$ follows certain sparse prior distribution and recover it using approximate message passing (AMP). However, the distribution parameters are unknown in practice and need to be estimated. Due to the increased computational complexity in the quantization noise model, previous works either use an approximated noise model or manually tune the noise distribution parameters. In this paper, we treat both signals and parameters as random variables and recover them jointly within the AMP framework. The proposed approach leads to a much simpler parameter estimation method, allowing us to work with the quantization noise model directly. Experimental results show that the proposed approach achieves state-of-the-art performance under various noise levels and does not require parameter tuning, making it a practical and maintenance-free approach for channel estimation.
207 - Shuai Huang , Trac D. Tran 2020
1-bit compressive sensing aims to recover sparse signals from quantized 1-bit measurements. Designing efficient approaches that could handle noisy 1-bit measurements is important in a variety of applications. In this paper we use the approximate mess age passing (AMP) to achieve this goal due to its high computational efficiency and state-of-the-art performance. In AMP the signal of interest is assumed to follow some prior distribution, and its posterior distribution can be computed and used to recover the signal. In practice, the parameters of the prior distributions are often unknown and need to be estimated. Previous works tried to find the parameters that maximize either the measurement likelihood or the Bethe free entropy, which becomes increasingly difficult to solve in the case of complicated probability models. Here we propose to treat the parameters as unknown variables and compute their posteriors via AMP as well, so that the parameters and the signal can be recovered jointly. This leads to a much simpler way to perform parameter estimation compared to previous methods and enables us to work with noisy 1-bit measurements. We further extend the proposed approach to the general quantization noise model that outputs multi-bit measurements. Experimental results show that the proposed approach generally perform much better than the other state-of-the-art methods in the zero-noise and moderate-noise regimes, and outperforms them in most of the cases in the high-noise regime.
Bagging, a powerful ensemble method from machine learning, improves the performance of unstable predictors. Although the power of Bagging has been shown mostly in classification problems, we demonstrate the success of employing Bagging in sparse regr ession over the baseline method (L1 minimization). The framework employs the generalized version of the original Bagging with various bootstrap ratios. The performance limits associated with different choices of bootstrap sampling ratio L/m and number of estimates K is analyzed theoretically. Simulation shows that the proposed method yields state-of-the-art recovery performance, outperforming L1 minimization and Bolasso in the challenging case of low levels of measurements. A lower L/m ratio (60% - 90%) leads to better performance, especially with a small number of measurements. With the reduced sampling rate, SNR improves over the original Bagging by up to 24%. With a properly chosen sampling ratio, a reasonably small number of estimates K = 30 gives satisfying result, even though increasing K is discovered to always improve or at least maintain the performance.
Ultra-wideband (UWB) radar systems nowadays typical operate in the low frequency spectrum to achieve penetration capability. However, this spectrum is also shared by many others communication systems, which causes missing information in the frequency bands. To recover this missing spectral information, we propose a generative adversarial network, called SARGAN, that learns the relationship between original and missing band signals by observing these training pairs in a clever way. Initial results shows that this approach is promising in tackling this challenging missing band problem.
Classical signal recovery based on $ell_1$ minimization solves the least squares problem with all available measurements via sparsity-promoting regularization. In practice, it is often the case that not all measurements are available or required for recovery. Measurements might be corrupted/missing or they arrive sequentially in streaming fashion. In this paper, we propose a global sparse recovery strategy based on subsets of measurements, named JOBS, in which multiple measurements vectors are generated from the original pool of measurements via bootstrapping, and then a joint-sparse constraint is enforced to ensure support consistency among multiple predictors. The final estimate is obtained by averaging over the $K$ predictors. The performance limits associated with different choices of number of bootstrap samples $L$ and number of estimates $K$ is analyzed theoretically. Simulation results validate some of the theoretical analysis, and show that the proposed method yields state-of-the-art recovery performance, outperforming $ell_1$ minimization and a few other existing bootstrap-based techniques in the challenging case of low levels of measurements and is preferable over other bagging-based methods in the streaming setting since it performs better with small $K$ and $L$ for data-sets with large sizes.
Mass segmentation provides effective morphological features which are important for mass diagnosis. In this work, we propose a novel end-to-end network for mammographic mass segmentation which employs a fully convolutional network (FCN) to model a po tential function, followed by a CRF to perform structured learning. Because the mass distribution varies greatly with pixel position, the FCN is combined with a position priori. Further, we employ adversarial training to eliminate over-fitting due to the small sizes of mammogram datasets. Multi-scale FCN is employed to improve the segmentation performance. Experimental results on two public datasets, INbreast and DDSM-BCRP, demonstrate that our end-to-end network achieves better performance than state-of-the-art approaches. footnote{https://github.com/wentaozhu/adversarial-deep-structural-networks.git}
Compressive sensing relies on the sparse prior imposed on the signal of interest to solve the ill-posed recovery problem in an under-determined linear system. The objective function used to enforce the sparse prior information should be both effectiv e and easily optimizable. Motivated by the entropy concept from information theory, in this paper we propose the generalized Shannon entropy function and R{e}nyi entropy function of the signal as the sparsity promoting regularizers. Both entropy functions are nonconvex, non-separable. Their local minimums only occur on the boundaries of the orthants in the Euclidean space. Compared to other popular objective functions, minimizing the generalized entropy functions adaptively promotes multiple high-energy coefficients while suppressing the rest low-energy coefficients. The corresponding optimization problems can be recasted into a series of reweighted $l_1$-norm minimization problems and then solved efficiently by adapting the FISTA. Sparse signal recovery experiments on both the simulated and real data show the proposed entropy functions minimization approaches perform better than other popular approaches and achieve state-of-the-art performances.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا