ترغب بنشر مسار تعليمي؟ اضغط هنا

Salient object detection is a fundamental topic in computer vision. Previous methods based on RGB-D often suffer from the incompatibility of multi-modal feature fusion and the insufficiency of multi-scale feature aggregation. To tackle these two dile mmas, we propose a novel multi-modal and multi-scale refined network (M2RNet). Three essential components are presented in this network. The nested dual attention module (NDAM) explicitly exploits the combined features of RGB and depth flows. The adjacent interactive aggregation module (AIAM) gradually integrates the neighbor features of high, middle and low levels. The joint hybrid optimization loss (JHOL) makes the predictions have a prominent outline. Extensive experiments demonstrate that our method outperforms other state-of-the-art approaches.
Modeling joint probability distributions is an important task in a wide variety of fields. One popular technique for this employs a family of multivariate distributions with uniform marginals called copulas. While the theory of modeling joint distrib utions via copulas is well understood, it gets practically challenging to accurately model real data with many variables. In this work, we design quantum machine learning algorithms to model copulas. We show that any copula can be naturally mapped to a multipartite maximally entangled state. A variational ansatz we christen as a `qopula creates arbitrary correlations between variables while maintaining the copula structure starting from a set of Bell pairs for two variables, or GHZ states for multiple variables. As an application, we train a Quantum Generative Adversarial Network (QGAN) and a Quantum Circuit Born Machine (QCBM) using this variational ansatz to generate samples from joint distributions of two variables for historical data from the stock market. We demonstrate our generative learning algorithms on trapped ion quantum computers from IonQ for up to 8 qubits and show that our results outperform those obtained through equivalent classical generative learning. Further, we present theoretical arguments for exponential advantage in our models expressivity over classical models based on communication and computational complexity arguments.
In this paper, we examine the effect of background risk on portfolio selection and optimal reinsurance design under the criterion of maximizing the probability of reaching a goal. Following the literature, we adopt dependence uncertainty to model the dependence ambiguity between financial risk (or insurable risk) and background risk. Because the goal-reaching objective function is non-concave, these two problems bring highly unconventional and challenging issues for which classical optimization techniques often fail. Using quantile formulation method, we derive the optimal solutions explicitly. The results show that the presence of background risk does not alter the shape of the solution but instead changes the parameter value of the solution. Finally, numerical examples are given to illustrate the results and verify the robustness of our solutions.
Video-and-Language Inference is a recently proposed task for joint video-and-language understanding. This new task requires a model to draw inference on whether a natural language statement entails or contradicts a given video clip. In this paper, we study how to address three critical challenges for this task: judging the global correctness of the statement involved multiple semantic meanings, joint reasoning over video and subtitles, and modeling long-range relationships and complex social interactions. First, we propose an adaptive hierarchical graph network that achieves in-depth understanding of the video over complex interactions. Specifically, it performs joint reasoning over video and subtitles in three hierarchies, where the graph structure is adaptively adjusted according to the semantic structures of the statement. Secondly, we introduce semantic coherence learning to explicitly encourage the semantic coherence of the adaptive hierarchical graph network from three hierarchies. The semantic coherence learning can further improve the alignment between vision and linguistics, and the coherence across a sequence of video segments. Experimental results show that our method significantly outperforms the baseline by a large margin.
87 - Ya Deng , Peiling Li , Chao Zhu 2021
Recently, new states of matter like superconducting or topological quantum states were found in transition metal dichalcogenides (TMDs) and manifested themselves in a series of exotic physical behaviors. Such phenomena have been demonstrated to exist in a series of transition metal tellurides including MoTe2, WTe2 and alloyed MoxW1-xTe2. However, the behaviors in the alloy system have been rarely addressed due to their difficulty in obtaining atomic layers with controlled composition, albeit the alloy offers a great platform to tune the quantum states. Here, we report a facile CVD method to synthesize the MoxW1-xTe2 with controllable thickness and chemical composition ratios. The atomic structure of monolayer MoxW1-xTe2 alloy was experimentally confirmed by scanning transmission electron microscopy (STEM). Importantly, two different transport behaviors including superconducting and Weyl semimetal (WSM) states were observed in Mo-rich Mo0.8W0.2Te2 and W-rich Mo0.2W0.8Te2 samples respectively. Our results show that the electrical properties of MoxW1-xTe2 can be tuned by controlling the chemical composition, demonstrating our controllable CVD growth method is an efficient strategy to manipulate the physical properties of TMDCs. Meanwhile, it provides a perspective on further comprehension and shed light on the design of device with topological multicomponent TMDCs materials.
Uncertainties from experiments and models render multi-modal difficulties in model calibrations. Bayesian inference and textsc{mcmc} algorithm have been applied to obtain posterior distributions of model parameters upon uncertainty. However, multi-mo dality leads to difficulty in convergence criterion of parallel textsc{mcmc} sampling chains. The commonly applied $widehat{R}$ diagnostic does not behave well when multiple sampling chains are evolving to different modes. Both partitional and hierarchical clustering methods has been combined to the traditional $widehat{R}$ diagnostic to deal with sampling of target distributions that are rough and multi-modal. It is observed that the distributions of binding parameters and pore diffusion of particle parameters are multi-modal. Therefore, the steric mass-action model used to describe ion-exchange effects of the model protein, lysozyme, on the textsc{sp} Sepharose textsc{ff} stationary phase might not be fully capable in certain experimental conditions, as model uncertainty from steric mass-action would result in multi-modality.
Graph Convolutional Networks (GCNs) are typically studied through the lens of Euclidean geometry. Non-Euclidean Riemannian manifolds provide specific inductive biases for embedding hierarchical or spherical data, but cannot align well with data of mi xed topologies. We consider a larger class of semi-Riemannian manifolds with indefinite metric that generalize hyperboloid and sphere as well as their submanifolds. We develop new geodesic tools that allow for extending neural network operations into geodesically disconnected semi-Riemannian manifolds. As a consequence, we derive a principled Semi-Riemannian GCN that first models data in semi-Riemannian manifolds of constant nonzero curvature in the context of graph neural networks. Our method provides a geometric inductive bias that is sufficiently flexible to model mixed heterogeneous topologies like hierarchical graphs with cycles. Empirical results demonstrate that our method outperforms Riemannian counterparts when embedding graphs of complex topologies.
It has been shown that deep neural networks are prone to overfitting on biased training data. Towards addressing this issue, meta-learning employs a meta model for correcting the training bias. Despite the promising performances, super slow training is currently the bottleneck in the meta learning approaches. In this paper, we introduce a novel Faster Meta Update Strategy (FaMUS) to replace the most expensive step in the meta gradient computation with a faster layer-wise approximation. We empirically find that FaMUS yields not only a reasonably accurate but also a low-variance approximation of the meta gradient. We conduct extensive experiments to verify the proposed method on two tasks. We show our method is able to save two-thirds of the training time while still maintaining the comparable or achieving even better generalization performance. In particular, our method achieves the state-of-the-art performance on both synthetic and realistic noisy labels, and obtains promising performance on long-tailed recognition on standard benchmarks.
Text-video retrieval is a challenging task that aims to search relevant video contents based on natural language descriptions. The key to this problem is to measure text-video similarities in a joint embedding space. However, most existing methods on ly consider the global cross-modal similarity and overlook the local details. Some works incorporate the local comparisons through cross-modal local matching and reasoning. These complex operations introduce tremendous computation. In this paper, we design an efficient global-local alignment method. The multi-modal video sequences and text features are adaptively aggregated with a set of shared semantic centers. The local cross-modal similarities are computed between the video feature and text feature within the same center. This design enables the meticulous local comparison and reduces the computational cost of the interaction between each text-video pair. Moreover, a global alignment method is proposed to provide a global cross-modal measurement that is complementary to the local perspective. The global aggregated visual features also provide additional supervision, which is indispensable to the optimization of the learnable semantic centers. We achieve consistent improvements on three standard text-video retrieval benchmarks and outperform the state-of-the-art by a clear margin.
Few-shot object detection (FSOD) aims to strengthen the performance of novel object detection with few labeled samples. To alleviate the constraint of few samples, enhancing the generalization ability of learned features for novel objects plays a key role. Thus, the feature learning process of FSOD should focus more on intrinsical object characteristics, which are invariant under different visual changes and therefore are helpful for feature generalization. Unlike previous attempts of the meta-learning paradigm, in this paper, we explore how to enhance object features with intrinsical characteristics that are universal across different object categories. We propose a new prototype, namely universal prototype, that is learned from all object categories. Besides the advantage of characterizing invariant characteristics, the universal prototypes alleviate the impact of unbalanced object categories. After enhancing object features with the universal prototypes, we impose a consistency loss to maximize the agreement between the enhanced features and the original ones, which is beneficial for learning invariant object characteristics. Thus, we develop a new framework of few-shot object detection with universal prototypes ({FSOD}^{up}) that owns the merit of feature generalization towards novel objects. Experimental results on PASCAL VOC and MS COCO show the effectiveness of {FSOD}^{up}. Particularly, for the 1-shot case of VOC Split2, {FSOD}^{up} outperforms the baseline by 6.8% in terms of mAP.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا