أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Ze Wang

Cross DQN: Cross Deep Q Network for Ads Allocation in Feed

69 - Guogang Liao , Ze Wang , Xiaoxu Wu 2021

E-commerce platforms usually display a mixed list of ads and organic items in feed. One key problem is to allocate the limited slots in the feed to maximize the overall revenue as well as improve user experience, which requires a good model for user preference. Instead of modeling the influence of individual items on user behaviors, the arrangement signal models the influence of the arrangement of items and may lead to a better allocation strategy. However, most of previous strategies fail to model such a signal and therefore result in suboptimal performance. To this end, we propose Cross Deep Q Network (Cross DQN) to extract the arrangement signal by crossing the embeddings of different items and processing the crossed sequence in the feed. Our model results in higher revenue and better user experience than state-of-the-art baselines in offline experiments. Moreover, our model demonstrates a significant improvement in the online A/B test and has been fully deployed on Meituan feed to serve more than 300 millions of customers.

التعلم الآلي

Origin of Charge Density Wave in Layered Kagome Metal CsV$_3$Sb$_5$

113 - Chongze Wang , Shuyuan Liu , Hyunsoo Jeon 2021

Using first-principles calculations, we identify the origin of the observed charge density wave (CDW) formation in a layered kagome metal CsV$_3$Sb$_5$. It is revealed that the structural distortion of kagome lattice forming the trimeric and hexameri c V atoms is accompanied by the stabilization of quasimolecular states, which gives rise to the opening of CDW gaps for the V-derived multibands lying around the Fermi level. This Jahn-Teller-like instability having the local lattice distortion and its derived quasimolecular states is a driving force of the CDW order. Specifically, the saddle points of multiple Dirac bands near the Fermi level, located at the $M$ point, are hybridized to disappear along the $k_z$ direction, therefore not supporting the widely accepted Peierls-like electronic instability due to Fermi surface nesting. It is further demonstrated that applied hydrostatic pressure significantly reduces the interlayer spacing to destabilize the quasimolecular states, leading to a disappearance of the CDW phase at a pressure of ${sim}$2 GPa. The presently proposed underlying mechanism of the CDW order in CsV$_3$Sb$_5$ can also be applicable to other isostructural kagome lattices such as KV$_3$Sb$_5$ and RbV$_3$Sb$_5$.

علم المواد الإلكترونات المرتبطة بشدة

Extremal problems of double stars

122 - Ervin GyH{o}ri , Runze Wang , Spencer Woolfson 2021

In a generalized Turan problem, two graphs $H$ and $F$ are given and the question is the maximum number of copies of $H$ in an $F$-free graph of order $n$. In this paper, we study the number of double stars $S_{k,l}$ in triangle-free graphs. We also study an opposite version of this question: what is the maximum number edges/triangles in graphs with double star type restrictions, which leads us to study two questions related to the extremal number of triangles or edges in graphs with degree-sum constraints over adjacent or non-adjacent vertices.

التوافقية

CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing Human Trust in Image Recognition Models

91 - Arjun R. Akula , Keze Wang , Changsong Liu 2021

We propose CX-ToM, short for counterfactual explanations with theory-of mind, a new explainable AI (XAI) framework for explaining decisions made by a deep convolutional neural network (CNN). In contrast to the current methods in XAI that generate exp lanations as a single shot response, we pose explanation as an iterative communication process, i.e. dialog, between the machine and human user. More concretely, our CX-ToM framework generates sequence of explanations in a dialog by mediating the differences between the minds of machine and human user. To do this, we use Theory of Mind (ToM) which helps us in explicitly modeling humans intention, machines mind as inferred by the human as well as humans mind as inferred by the machine. Moreover, most state-of-the-art XAI frameworks provide attention (or heat map) based explanations. In our work, we show that these attention based explanations are not sufficient for increasing human trust in the underlying CNN model. In CX-ToM, we instead use counterfactual explanations called fault-lines which we define as follows: given an input image I for which a CNN classification model M predicts class c_pred, a fault-line identifies the minimal semantic-level features (e.g., stripes on zebra, pointed ears of dog), referred to as explainable concepts, that need to be added to or deleted from I in order to alter the classification category of I by M to another specified class c_alt. We argue that, due to the iterative, conceptual and counterfactual nature of CX-ToM explanations, our framework is practical and more natural for both expert and non-expert users to understand the internal workings of complex deep learning models. Extensive quantitative and qualitative experiments verify our hypotheses, demonstrating that our CX-ToM significantly outperforms the state-of-the-art explainable AI models.

الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Cross-time functional connectivity analysis

79 - Ze Wang 2021

A large body of literature has shown the substantial inter-regional functional connectivity in the mammal brain. One important property remaining un-studied is the cross-time interareal connection. This paper serves to provide a tool to characterize the cross-time functional connectivity. The method is extended from the temporal embedding based brain temporal coherence analysis. Both synthetic data and in-vivo data were used to evaluate the various properties of the cross-time functional connectivity matrix, which is also called the cross-regional temporal coherence matrix.

الخلايا العصبية والإدراك الأساليب الكمية

Resting state fMRI-based temporal coherence mapping

120 - Ze Wang 2021

Long-range temporal coherence (LRTC) is quite common to dynamic systems and is fundamental to the system function. LRTC in the brain has been shown to be important to cognition. Assessing LRTC may provide critical information for understanding the po tential underpinnings of brain organization, function, and cognition. To facilitate this overarching goal, we provide a method, which is named temporal coherence mapping (TCM), to explicitly quantify LRTC using resting state fMRI. TCM is based on correlation analysis of the transit states of the phase space reconstructed by temporal embedding. A few TCM properties were collected to measure LRTC, including the averaged correlation, anti-correlation, the ratio of correlation and anticorrelation, the mean coherent and incoherent duration, and the ratio between the coherent and incoherent time. TCM was first evaluated with simulations and then with the large Human Connectome Project data. Evaluation results showed that TCM metrics can successfully differentiate signals with different temporal coherence regardless of the parameters used to reconstruct the phase space. In human brain, TCM metrics except the ratio of the coherent/incoherent time showed high test-retest reproducibility; TCM metrics are related to age, sex, and total cognitive scores. In summary, TCM provides a first-of-its-kind tool to assess LRTC and the imbalance between coherence and incoherence; TCM properties are physiologically and cognitively meaningful.

الخلايا العصبية والإدراك الأساليب الكمية

Category-Level 6D Object Pose Estimation via Cascaded Relation and Recurrent Reconstruction Networks

234 - Jiaze Wang , Kai Chen , Qi Dou 2021

Category-level 6D pose estimation, aiming to predict the location and orientation of unseen object instances, is fundamental to many scenarios such as robotic manipulation and augmented reality, yet still remains unsolved. Precisely recovering instan ce 3D model in the canonical space and accurately matching it with the observation is an essential point when estimating 6D pose for unseen objects. In this paper, we achieve accurate category-level 6D pose estimation via cascaded relation and recurrent reconstruction networks. Specifically, a novel cascaded relation network is dedicated for advanced representation learning to explore the complex and informative relations among instance RGB image, instance point cloud and category shape prior. Furthermore, we design a recurrent reconstruction network for iterative residual refinement to progressively improve the reconstruction and correspondence estimations from coarse to fine. Finally, the instance 6D pose is obtained leveraging the estimated dense correspondences between the instance point cloud and the reconstructed 3D model in the canonical space. We have conducted extensive experiments on two well-acknowledged benchmarks of category-level 6D pose estimation, with significant performance improvement over existing approaches. On the representatively strict evaluation metrics of $3D_{75}$ and $5^{circ}2 cm$, our method exceeds the latest state-of-the-art SPD by $4.9%$ and $17.7%$ on the CAMERA25 dataset, and by $2.7%$ and $8.5%$ on the REAL275 dataset. Codes are available at https://wangjiaze.cn/projects/6DPoseEstimation.html.

الرؤية الحاسوبية وتمييز الأنماط

Adaptive Convolutions with Per-pixel Dynamic Filter Atom

163 - Ze Wang , Zichen Miao , Jun Hu 2021

Applying feature dependent network weights have been proved to be effective in many fields. However, in practice, restricted by the enormous size of model parameters and memory footprints, scalable and versatile dynamic convolutions with per-pixel ad apted filters are yet to be fully explored. In this paper, we address this challenge by decomposing filters, adapted to each spatial position, over dynamic filter atoms generated by a light-weight network from local features. Adaptive receptive fields can be supported by further representing each filter atom over sets of pre-fixed multi-scale bases. As plug-and-play replacements to convolutional layers, the introduced adaptive convolutions with per-pixel dynamic atoms enable explicit modeling of intra-image variance, while avoiding heavy computation, parameters, and memory cost. Our method preserves the appealing properties of conventional convolutions as being translation-equivariant and parametrically efficient. We present experiments to show that, the proposed method delivers comparable or even better performance across tasks, and are particularly effective on handling tasks with significant intra-image variance.

الرؤية الحاسوبية وتمييز الأنماط

Instance-aware Remote Sensing Image Captioning with Cross-hierarchy Attention

158 - Chengze Wang , Zhiyu Jiang , Yuan Yuan 2021

The spatial attention is a straightforward approach to enhance the performance for remote sensing image captioning. However, conventional spatial attention approaches consider only the attention distribution on one fixed coarse grid, resulting in the semantics of tiny objects can be easily ignored or disturbed during the visual feature extraction. Worse still, the fixed semantic level of conventional spatial attention limits the image understanding in different levels and perspectives, which is critical for tackling the huge diversity in remote sensing images. To address these issues, we propose a remote sensing image caption generator with instance-awareness and cross-hierarchy attention. 1) The instances awareness is achieved by introducing a multi-level feature architecture that contains the visual information of multi-level instance-possible regions and their surroundings. 2) Moreover, based on this multi-level feature extraction, a cross-hierarchy attention mechanism is proposed to prompt the decoder to dynamically focus on different semantic hierarchies and instances at each time step. The experimental results on public datasets demonstrate the superiority of proposed approach over existing methods.

الرؤية الحاسوبية وتمييز الأنماط

Towards Solving Inefficiency of Self-supervised Representation Learning

421 - Guangrun Wang , Keze Wang , Guangcong Wang 2021

Self-supervised learning (especially contrastive learning) has attracted great interest due to its tremendous potentials in learning discriminative representations in an unsupervised manner. Despite the acknowledged successes, existing contrastive le arning methods suffer from very low learning efficiency, e.g., taking about ten times more training epochs than supervised learning for comparable recognition accuracy. In this paper, we discover two contradictory phenomena in contrastive learning that we call under-clustering and over-clustering problems, which are major obstacles to learning efficiency. Under-clustering means that the model cannot efficiently learn to discover the dissimilarity between inter-class samples when the negative sample pairs for contrastive learning are insufficient to differentiate all the actual object categories. Over-clustering implies that the model cannot efficiently learn the feature representation from excessive negative sample pairs, which enforces the model to over-cluster samples of the same actual categories into different clusters. To simultaneously overcome these two problems, we propose a novel self-supervised learning framework using a median triplet loss. Precisely, we employ a triplet loss tending to maximize the relative distance between the positive pair and negative pairs to address the under-clustering problem; and we construct the negative pair by selecting the negative sample of a median similarity score from all negative samples to avoid the over-clustering problem, guaranteed by the Bernoulli Distribution model. We extensively evaluate our proposed framework in several large-scale benchmarks (e.g., ImageNet, SYSU-30k, and COCO). The results demonstrate the superior performance (e.g., the learning efficiency) of our model over the latest state-of-the-art methods by a clear margin. Codes available at: https://github.com/wanggrun/triplet.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد