أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Bin Liu

$C_n$-symmetric higher-order topological crystalline insulators in atomically thin transition-metal dichalcogenides

216 - Shifeng Qian , Gui-bin Liu , Cheng-Cheng Liu 2021

Based on first-principles calculations and symmetry analysis, we predict atomically thin ($1-N$ layers) 2H group-VIB TMDs $MX_2$ ($M$ = Mo, W; $X$ = S, Se, Te) are large-gap higher-order topological crystalline insulators protected by $C_3$ rotation symmetry. We explicitly demonstrate the nontrivial topological indices and existence of the hallmark corner states with quantized fractional charge for these familiar TMDs with large bulk optical band gaps ($1.64-1.95$ eV for the monolayers), which would facilitate the experimental detection by STM. We find that the well-defined corner states exist in the triangular finite-size flakes with armchair edges of the atomically thin ($1-N$ layers) 2H group-VIB TMDs, and the corresponding quantized fractional charge is the number of layers $N$ divided by 3 modulo integers, which will simply double including spin degree of freedom.

الفيزياء ميسكالي وننكالي علم المواد

Massive molecular gas reservoir in a luminous sub-millimeter galaxy during cosmic noon

105 - Bin Liu , N. Chartab , H. Nayyeri 2021

We present multi-band observations of an extremely dusty star-forming lensed galaxy (HERS1) at $z=2.553$. High-resolution maps of textit{HST}/WFC3, SMA, and ALMA show a partial Einstein-ring with a radius of $sim$3$^{primeprime}$. The deeper HST obse rvations also show the presence of a lensing arc feature associated with a second lens source, identified to be at the same redshift as the bright arc based on a detection of the [NII] 205$mu$m emission line with ALMA. A detailed model of the lensing system is constructed using the high-resolution HST/WFC3 image, which allows us to study the source plane properties and connect rest-frame optical emission with properties of the galaxy as seen in sub-millimeter and millimeter wavelengths. Corrected for lensing magnification, the spectral energy distribution fitting results yield an intrinsic star formation rate of about $1000pm260$ ${rm M_{odot}}$yr$^{-1}$, a stellar mass ${rm M_*}=4.3^{+2.2}_{-1.0}times10^{11} {rm M_{odot}}$, and a dust temperature ${rm T}_{rm d}=35^{+2}_{-1}$ K. The intrinsic CO emission line ($J_{rm up}=3,4,5,6,7,9$) flux densities and CO spectral line energy distribution are derived based on the velocity-dependent magnification factors. We apply a radiative transfer model using the large velocity gradient method with two excitation components to study the gas properties. The low-excitation component has a gas density $n_{rm H_2}=10^{3.1pm0.6}$ cm$^{-3}$ and kinetic temperature ${rm T}_{rm k}=19^{+7}_{-5}$ K and a high-excitation component has $n_{rm H_2}=10^{2.8pm0.3}$ cm$^{-3}$ and ${rm T}_{rm k}=550^{+260}_{-220}$ K. Additionally, HERS1 has a gas fraction of about $0.4pm0.2$ and is expected to last 250 Myr. These properties offer a detailed view of a typical sub-millimeter galaxy during the peak epoch of star-formation activity.

الفيزياء الفلكية من المجرات

ISNet: Integrate Image-Level and Semantic-Level Context for Semantic Segmentation

144 - Zhenchao Jin , Bin Liu , Qi Chu 2021

Co-occurrent visual pattern makes aggregating contextual information a common paradigm to enhance the pixel representation for semantic image segmentation. The existing approaches focus on modeling the context from the perspective of the whole image, i.e., aggregating the image-level contextual information. Despite impressive, these methods weaken the significance of the pixel representations of the same category, i.e., the semantic-level contextual information. To address this, this paper proposes to augment the pixel representations by aggregating the image-level and semantic-level contextual information, respectively. First, an image-level context module is designed to capture the contextual information for each pixel in the whole image. Second, we aggregate the representations of the same category for each pixel where the category regions are learned under the supervision of the ground-truth segmentation. Third, we compute the similarities between each pixel representation and the image-level contextual information, the semantic-level contextual information, respectively. At last, a pixel representation is augmented by weighted aggregating both the image-level contextual information and the semantic-level contextual information with the similarities as the weights. Integrating the image-level and semantic-level context allows this paper to report state-of-the-art accuracy on four benchmarks, i.e., ADE20K, LIP, COCOStuff and Cityscapes.

الرؤية الحاسوبية وتمييز الأنماط

EncoderMI: Membership Inference against Pre-trained Encoders in Contrastive Learning

156 - Hongbin Liu , Jinyuan Jia , Wenjie Qu 2021

Given a set of unlabeled images or (image, text) pairs, contrastive learning aims to pre-train an image encoder that can be used as a feature extractor for many downstream tasks. In this work, we propose EncoderMI, the first membership inference meth od against image encoders pre-trained by contrastive learning. In particular, given an input and a black-box access to an image encoder, EncoderMI aims to infer whether the input is in the training dataset of the image encoder. EncoderMI can be used 1) by a data owner to audit whether its (public) data was used to pre-train an image encoder without its authorization or 2) by an attacker to compromise privacy of the training data when it is private/sensitive. Our EncoderMI exploits the overfitting of the image encoder towards its training data. In particular, an overfitted image encoder is more likely to output more (or less) similar feature vectors for two augmente

التشفير والأمن الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Detection of Illicit Drug Trafficking Events on Instagram: A Deep Multimodal Multilabel Learning Approach

89 - Chuanbo Hu , Minglei Yin , Bin Liu 2021

Social media such as Instagram and Twitter have become important platforms for marketing and selling illicit drugs. Detection of online illicit drug trafficking has become critical to combat the online trade of illicit drugs. However, the legal statu s often varies spatially and temporally; even for the same drug, federal and state legislation can have different regulations about its legality. Meanwhile, more drug trafficking events are disguised as a novel form of advertising commenting leading to information heterogeneity. Accordingly, accurate detection of illicit drug trafficking events (IDTEs) from social media has become even more challenging. In this work, we conduct the first systematic study on fine-grained detection of IDTEs on Instagram. We propose to take a deep multimodal multilabel learning (DMML) approach to detect IDTEs and demonstrate its effectiveness on a newly constructed dataset called multimodal IDTE(MM-IDTE). Specifically, our model takes text and image data as the input and combines multimodal information to predict multiple labels of illicit drugs. Inspired by the success of BERT, we have developed a self-supervised multimodal bidirectional transformer by jointly fine-tuning pretrained text and image encoders. We have constructed a large-scale dataset MM-IDTE with manually annotated multiple drug labels to support fine-grained detection of illicit drugs. Extensive experimental results on the MM-IDTE dataset show that the proposed DMML methodology can accurately detect IDTEs even in the presence of special characters and style changes attempting to evade detection.

التعلم الآلي الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

Identifying Illicit Drug Dealers on Instagram with Large-scale Multimodal Data Fusion

88 - Chuanbo Hu , Minglei Yin , Bin Liu 2021

Illicit drug trafficking via social media sites such as Instagram has become a severe problem, thus drawing a great deal of attention from law enforcement and public health agencies. How to identify illicit drug dealers from social media data has rem ained a technical challenge due to the following reasons. On the one hand, the available data are limited because of privacy concerns with crawling social media sites; on the other hand, the diversity of drug dealing patterns makes it difficult to reliably distinguish drug dealers from common drug users. Unlike existing methods that focus on posting-based detection, we propose to tackle the problem of illicit drug dealer identification by constructing a large-scale multimodal dataset named Identifying Drug Dealers on Instagram (IDDIG). Totally nearly 4,000 user accounts, of which over 1,400 are drug dealers, have been collected from Instagram with multiple data sources including post comments, post images, homepage bio, and homepage images. We then design a quadruple-based multimodal fusion method to combine the multiple data sources associated with each user account for drug dealer identification. Experimental results on the constructed IDDIG dataset demonstrate the effectiveness of the proposed method in identifying drug dealers (almost 95% accuracy). Moreover, we have developed a hashtag-based community detection technique for discovering evolving patterns, especially those related to geography and drug types.

التعلم الآلي معالجة الصور والفيديو

Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

80 - Shulun Wang , Bin Liu , Feng Liu 2021

Softmax is widely used in neural networks for multiclass classification, gate structure and attention mechanisms. The statistical assumption that the input is normal distributed supports the gradient stability of Softmax. However, when used in attent ion mechanisms such as transformers, since the correlation scores between embeddings are often not normally distributed, the gradient vanishing problem appears, and we prove this point through experimental confirmation. In this work, we suggest that replacing the exponential function by periodic functions, and we delve into some potential periodic alternatives of Softmax from the view of value and gradient. Through experiments on a simply designed demo referenced to LeViT, our method is proved to be able to alleviate the gradient problem and yield substantial improvements compared to Softmax and its variants. Further, we analyze the impact of pre-normalization for Softmax and our methods through mathematics and experiments. Lastly, we increase the depth of the demo and prove the applicability of our method in deep structures.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Lifelong Intent Detection via Multi-Strategy Rebalancing

101 - Qingbin Liu , Xiaoyan Yu , Shizhu He 2021

Conventional Intent Detection (ID) models are usually trained offline, which relies on a fixed dataset and a predefined set of intent classes. However, in real-world applications, online systems usually involve continually emerging new user intents, which pose a great challenge to the offline training paradigm. Recently, lifelong learning has received increasing attention and is considered to be the most promising solution to this challenge. In this paper, we propose Lifelong Intent Detection (LID), which continually trains an ID model on new data to learn newly emerging intents while avoiding catastrophically forgetting old data. Nevertheless, we find that existing lifelong learning methods usually suffer from a serious imbalance between old and new data in the LID task. Therefore, we propose a novel lifelong learning method, Multi-Strategy Rebalancing (MSR), which consists of cosine normalization, hierarchical knowledge distillation, and inter-class margin loss to alleviate the multiple negative effects of the imbalance problem. Experimental results demonstrate the effectiveness of our method, which significantly outperforms previous state-of-the-art lifelong learning methods on the ATIS, SNIPS, HWU64, and CLINC150 benchmarks.

الحساب واللغة

Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization

100 - Rui Qian , Yuxi Li , Huabin Liu 2021

The crux of self-supervised video representation learning is to build general features from unlabeled videos. However, most recent works have mainly focused on high-level semantics and neglected lower-level representations and their temporal relation ship which are crucial for general video understanding. To address these challenges, this paper proposes a multi-level feature optimization framework to improve the generalization and temporal modeling ability of learned video representations. Concretely, high-level features obtained from naive and prototypical contrastive learning are utilized to build distribution graphs, guiding the process of low-level and mid-level feature learning. We also devise a simple temporal modeling module from multi-level features to enhance motion pattern learning. Experiments demonstrate that multi-level feature optimization with the graph constraint and temporal modeling can greatly improve the representation ability in video understanding. Code is available at https://github.com/shvdiwnkozbw/Video-Representation-via-Multi-level-Optimization.

الرؤية الحاسوبية وتمييز الأنماط

Cascaded Residual Density Network for Crowd Counting

158 - Kun Zhao , Luchuan Song , Bin Liu 2021

Crowd counting is a challenging task due to the issues such as scale variation and perspective variation in real crowd scenes. In this paper, we propose a novel Cascaded Residual Density Network (CRDNet) in a coarse-to-fine approach to generate the h igh-quality density map for crowd counting more accurately. (1) We estimate the residual density maps by multi-scale pyramidal features through cascaded residual density modules. It can improve the quality of density map layer by layer effectively. (2) A novel additional local count loss is presented to refine the accuracy of crowd counting, which reduces the errors of pixel-wise Euclidean loss by restricting the number of people in the local crowd areas. Experiments on two public benchmark datasets show that the proposed method achieves effective improvement compared with the state-of-the-art methods.

الرؤية الحاسوبية وتمييز الأنماط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد