أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Lili Wang

Embedding Node Structural Role Identity Using Stress Majorization

199 - Lili Wang , Chenghan Huang , Weicheng Ma 2021

Nodes in networks may have one or more functions that determine their role in the system. As opposed to local proximity, which captures the local context of nodes, the role identity captures the functional role that nodes play in a network, such as b eing the center of a group, or the bridge between two groups. This means that nodes far apart in a network can have similar structural role identities. Several recent works have explored methods for embedding the roles of nodes in networks. However, these methods all rely on either approximating or indirect modeling of structural equivalence. In this paper, we present a novel and flexible framework using stress majorization, to transform the high-dimensional role identities in networks directly (without approximation or indirect modeling) to a low-dimensional embedding space. Our method is also flexible, in that it does not rely on specific structural similarity definitions. We evaluated our method on the tasks of node classification, clustering, and visualization, using three real-world and five synthetic networks. Our experiments show that our framework achieves superior results than existing methods in learning node role representations.

الشبكات الاجتماعية والمعلومات الذكاء الاصطناعي التعلم الآلي

Graph Embedding via Diffusion-Wavelets-Based Node Feature Distribution Characterization

524 - Lili Wang , Chenghan Huang , Weicheng Ma 2021

Recent years have seen a rise in the development of representational learning methods for graph data. Most of these methods, however, focus on node-level representation learning at various scales (e.g., microscopic, mesoscopic, and macroscopic node e mbedding). In comparison, methods for representation learning on whole graphs are currently relatively sparse. In this paper, we propose a novel unsupervised whole graph embedding method. Our method uses spectral graph wavelets to capture topological similarities on each k-hop sub-graph between nodes and uses them to learn embeddings for the whole graph. We evaluate our method against 12 well-known baselines on 4 real-world datasets and show that our method achieves the best performance across all experiments, outperforming the current state-of-the-art by a considerable margin.

التعلم الآلي الذكاء الاصطناعي الشبكات الاجتماعية والمعلومات

Transformer with Peak Suppression and Knowledge Guidance for Fine-grained Image Recognition

110 - Xinda Liu , Lili Wang , Xiaoguang Han 2021

Fine-grained image recognition is challenging because discriminative clues are usually fragmented, whether from a single image or multiple images. Despite their significant improvements, most existing methods still focus on the most discriminative pa rts from a single image, ignoring informative details in other regions and lacking consideration of clues from other associated images. In this paper, we analyze the difficulties of fine-grained image recognition from a new perspective and propose a transformer architecture with the peak suppression module and knowledge guidance module, which respects the diversification of discriminative features in a single image and the aggregation of discriminative clues among multiple images. Specifically, the peak suppression module first utilizes a linear projection to convert the input image into sequential tokens. It then blocks the token based on the attention response generated by the transformer encoder. This module penalizes the attention to the most discriminative parts in the feature learning process, therefore, enhancing the information exploitation of the neglected regions. The knowledge guidance module compares the image-based representation generated from the peak suppression module with the learnable knowledge embedding set to obtain the knowledge response coefficients. Afterwards, it formalizes the knowledge learning as a classification problem using response coefficients as the classification scores. Knowledge embeddings and image-based representations are updated during training so that the knowledge embedding includes discriminative clues for different images. Finally, we incorporate the acquired knowledge embeddings into the image-based representations as comprehensive representations, leading to significantly higher performance. Extensive evaluations on the six popular datasets demonstrate the advantage of the proposed method.

الوسائط المتعددة الرؤية الحاسوبية وتمييز الأنماط معالجة الصور والفيديو

Embedding Heterogeneous Networks into Hyperbolic Space Without Meta-path

179 - Lili Wang , Chongyang Gao , Chenghan Huang 2021

Networks found in the real-world are numerous and varied. A common type of network is the heterogeneous network, where the nodes (and edges) can be of different types. Accordingly, there have been efforts at learning representations of these heteroge neous networks in low-dimensional space. However, most of the existing heterogeneous network embedding methods suffer from the following two drawbacks: (1) The target space is usually Euclidean. Conversely, many recent works have shown that complex networks may have hyperbolic latent anatomy, which is non-Euclidean. (2) These methods usually rely on meta-paths, which require domain-specific prior knowledge for meta-path selection. Additionally, different down-streaming tasks on the same network might require different meta-paths in order to generate task-specific embeddings. In this paper, we propose a novel self-guided random walk method that does not require meta-path for embedding heterogeneous networks into hyperbolic space. We conduct thorough experiments for the tasks of network reconstruction and link prediction on two public datasets, showing that our model outperforms a variety of well-known baselines across all tasks.

الشبكات الاجتماعية والمعلومات الذكاء الاصطناعي

Political Depolarization of News Articles Using Attribute-aware Word Embeddings

162 - Ruibo Liu , Lili Wang , Chenyan Jia 2021

Political polarization in the US is on the rise. This polarization negatively affects the public sphere by contributing to the creation of ideological echo chambers. In this paper, we focus on addressing one of the factors that contributes to this po larity, polarized media. We introduce a framework for depolarizing news articles. Given an article on a certain topic with a particular ideological slant (eg., liberal or conservative), the framework first detects polar language in the article and then generates a new article with the polar language replaced with neutral expressions. To detect polar words, we train a multi-attribute-aware word embedding model that is aware of ideology and topics on 360k full-length media articles. Then, for text generation, we propose a new algorithm called Text Annealing Depolarization Algorithm (TADA). TADA retrieves neutral expressions from the word embedding model that not only decrease ideological polarity but also preserve the original argument of the text, while maintaining grammatical correctness. We evaluate our framework by comparing the depolarized output of our model in two modes, fully-automatic and semi-automatic, on 99 stories spanning 11 topics. Based on feedback from 161 human testers, our framework successfully depolarized 90.1% of paragraphs in semi-automatic mode and 78.3% of paragraphs in fully-automatic mode. Furthermore, 81.2% of the testers agree that the non-polar content information is well-preserved and 79% agree that depolarization does not harm semantic correctness when they compare the original text and the depolarized text. Our work shows that data-driven methods can help to locate political polarity and aid in the depolarization of articles.

الحساب واللغة الذكاء الاصطناعي

Improvements and Extensions on Metaphor Detection

138 - Weicheng Ma , Ruibo Liu , Lili Wang 2020

Metaphors are ubiquitous in human language. The metaphor detection task (MD) aims at detecting and interpreting metaphors from written language, which is crucial in natural language understanding (NLU) research. In this paper, we introduce a pre-trai ned Transformer-based model into MD. Our model outperforms the previous state-of-the-art models by large margins in our evaluations, with relative improvements on the F-1 score from 5.33% to 28.39%. Second, we extend MD to a classification task about the metaphoricity of an entire piece of text to make MD applicable in more general NLU scenes. Finally, we clean up the improper or outdated annotations in one of the MD benchmark datasets and re-benchmark it with our Transformer-based model. This approach could be applied to other existing MD datasets as well, since the metaphoricity annotations in these benchmark datasets may be outdated. Future research efforts are also necessary to build an up-to-date and well-annotated dataset consisting of longer and more complex texts.

الحساب واللغة التعلم الآلي

An Empirical Survey of Unsupervised Text Representation Methods on Twitter Data

173 - Lili Wang , Chongyang Gao , Jason Wei 2020

The field of NLP has seen unprecedented achievements in recent years. Most notably, with the advent of large-scale pre-trained Transformer-based language models, such as BERT, there has been a noticeable improvement in text representation. It is, how ever, unclear whether these improvements translate to noisy user-generated text, such as tweets. In this paper, we present an experimental survey of a wide range of well-known text representation techniques for the task of text clustering on noisy Twitter data. Our results indicate that the more advanced models do not necessarily work best on tweets and that more exploration in this area is needed.

الحساب واللغة التعلم الآلي

Towards Improved Model Design for Authorship Identification: A Survey on Writing Style Understanding

86 - Weicheng Ma , Ruibo Liu , Lili Wang 2020

Authorship identification tasks, which rely heavily on linguistic styles, have always been an important part of Natural Language Understanding (NLU) research. While other tasks based on linguistic style understanding benefit from deep learning method s, these methods have not behaved as well as traditional machine learning methods in many authorship-based tasks. With these tasks becoming more and more challenging, however, traditional machine learning methods based on handcrafted feature sets are already approaching their performance limits. Thus, in order to inspire future applications of deep learning methods in authorship-based tasks in ways that benefit the extraction of stylistic features, we survey authorship-based tasks and other tasks related to writing style understanding. We first describe our survey results on the current state of research in both sets of tasks and summarize existing achievements and problems in authorship-related tasks. We then describe outstanding methods in style-related tasks in general and analyze how they are used in combination in the top-performing models. We are optimistic about the applicability of these models to authorship-based tasks and hope our survey will help advance research in this field.

الحساب واللغة التعلم الآلي

A Simulation-free Group Sequential Design with Max-combo Tests in the Presence of Non-proportional Hazards

282 - Lili Wang Department ofn Biostatistics 2019

Non-proportional hazards (NPH) have been observed recently in many immuno-oncology clinical trials. Weighted log-rank tests (WLRT) with suitably chosen weights can be used to improve the power of detecting the difference of the two survival curves in the presence of NPH. However, it is not easy to choose a proper WLRT in practice when both robustness and efficiency are considered. A versatile maxcombo test was proposed to achieve the balance of robustness and efficiency and has received increasing attentions in both methodology development and application. However, survival trials often warrant interim analyses due to its high cost and long duration. The integration and application of maxcombo tests in interim analyses often require extensive simulation studies. In this paper, we propose a simulation-free approach for group sequential design with maxcombo test in survival trials. The simulation results support that the proposed approaches successfully control both the type I error rate and offer great accuracy and flexibility in estimating sample sizes, at the expense of light computation burden. Notably, our methods display a strong robustness towards various model misspecifications, and have been implemented in an R package for free access online.

المنهجية

Time analyticity of ancient solutions to the heat equation on graphs

72 - Fengwen Han , Bobo Hua , Lili Wang 2019

We study the time analyticity of ancient solutions to heat equations on graphs. Analogous to Dong and Zhang [DZ19], we prove the time analyticity of ancient solutions on graphs under some sharp growth condition.

الهندسة التفاضلية التوافقية

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد