ترغب بنشر مسار تعليمي؟ اضغط هنا

Fourier Series Expansion Based Filter Parametrization for Equivariant Convolutions

88   0   0.0 ( 0 )
 نشر من قبل Qi Xie
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

It has been shown that equivariant convolution is very helpful for many types of computer vision tasks. Recently, the 2D filter parametrization technique plays an important role when designing equivariant convolutions. However, the current filter parametrization method still has its evident drawbacks, where the most critical one lies in the accuracy problem of filter representation. Against this issue, in this paper we modify the classical Fourier series expansion for 2D filters, and propose a new set of atomic basis functions for filter parametrization. The proposed filter parametrization method not only finely represents 2D filters with zero error when the filter is not rotated, but also substantially alleviates the fence-effect-caused quality degradation when the filter is rotated. Accordingly, we construct a new equivariant convolution method based on the proposed filter parametrization method, named F-Conv. We prove that the equivariance of the proposed F-Conv is exact in the continuous domain, which becomes approximate only after discretization. Extensive experiments show the superiority of the proposed method. Particularly, we adopt rotation equivariant convolution methods to image super-resolution task, and F-Conv evidently outperforms previous filter parametrization based method in this task, reflecting its intrinsic capability of faithfully preserving rotation symmetries in local image features.

قيم البحث

اقرأ أيضاً

163 - Ze Wang , Zichen Miao , Jun Hu 2021
Applying feature dependent network weights have been proved to be effective in many fields. However, in practice, restricted by the enormous size of model parameters and memory footprints, scalable and versatile dynamic convolutions with per-pixel ad apted filters are yet to be fully explored. In this paper, we address this challenge by decomposing filters, adapted to each spatial position, over dynamic filter atoms generated by a light-weight network from local features. Adaptive receptive fields can be supported by further representing each filter atom over sets of pre-fixed multi-scale bases. As plug-and-play replacements to convolutional layers, the introduced adaptive convolutions with per-pixel dynamic atoms enable explicit modeling of intra-image variance, while avoiding heavy computation, parameters, and memory cost. Our method preserves the appealing properties of conventional convolutions as being translation-equivariant and parametrically efficient. We present experiments to show that, the proposed method delivers comparable or even better performance across tasks, and are particularly effective on handling tasks with significant intra-image variance.
Modern image inpainting systems, despite the significant progress, often struggle with large missing areas, complex geometric structures, and high-resolution images. We find that one of the main reasons for that is the lack of an effective receptive field in both the inpainting network and the loss function. To alleviate this issue, we propose a new method called large mask inpainting (LaMa). LaMa is based on i) a new inpainting network architecture that uses fast Fourier convolutions, which have the image-wide receptive field; ii) a high receptive field perceptual loss; and iii) large training masks, which unlocks the potential of the first two components. Our inpainting network improves the state-of-the-art across a range of datasets and achieves excellent performance even in challenging scenarios, e.g. completion of periodic structures. Our model generalizes surprisingly well to resolutions that are higher than those seen at train time, and achieves this at lower parameter&compute costs than the competitive baselines. The code is available at https://github.com/saic-mdal/lama.
Equivariant neural networks (ENNs) are graph neural networks embedded in $mathbb{R}^3$ and are well suited for predicting molecular properties. The ENN library e3nn has customizable convolutions, which can be designed to depend only on distances betw een points, or also on angular features, making them rotationally invariant, or equivariant, respectively. This paper studies the practical value of including angular dependencies for molecular property prediction directly via an ablation study with texttt{e3nn} and the QM9 data set. We find that, for fixed network depth and parameter count, adding angular features decreased test error by an average of 23%. Meanwhile, increasing network depth decreased test error by only 4% on average, implying that rotationally equivariant layers are comparatively parameter efficient. We present an explanation of the accuracy improvement on the dipole moment, the target which benefited most from the introduction of angular features.
An abstract theory of Fourier series in locally convex topological vector spaces is developed. An analog of Fej{e}rs theorem is proved for these series. The theory is applied to distributional solutions of Cauchy-Riemann equations to recover basic re sults of complex analysis. Some classical results of function theory are also shown to be consequences of the series expansion.
Spatial-temporal graphs have been widely used by skeleton-based action recognition algorithms to model human action dynamics. To capture robust movement patterns from these graphs, long-range and multi-scale context aggregation and spatial-temporal d ependency modeling are critical aspects of a powerful feature extractor. However, existing methods have limitations in achieving (1) unbiased long-range joint relationship modeling under multi-scale operators and (2) unobstructed cross-spacetime information flow for capturing complex spatial-temporal dependencies. In this work, we present (1) a simple method to disentangle multi-scale graph convolutions and (2) a unified spatial-temporal graph convolutional operator named G3D. The proposed multi-scale aggregation scheme disentangles the importance of nodes in different neighborhoods for effective long-range modeling. The proposed G3D module leverages dense cross-spacetime edges as skip connections for direct information propagation across the spatial-temporal graph. By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets: NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا