أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Jiezhong Qiu

Modeling Protein Using Large-scale Pretrain Language Model

87 - Yijia Xiao , Jiezhong Qiu , Ziang Li 2021

Protein is linked to almost every life process. Therefore, analyzing the biological structure and property of protein sequences is critical to the exploration of life, as well as disease detection and drug discovery. Traditional protein analysis meth ods tend to be labor-intensive and time-consuming. The emergence of deep learning models makes modeling data patterns in large quantities of data possible. Interdisciplinary researchers have begun to leverage deep learning methods to model large biological datasets, e.g. using long short-term memory and convolutional neural network for protein sequence classification. After millions of years of evolution, evolutionary information is encoded in protein sequences. Inspired by the similarity between natural language and protein sequences, we use large-scale language models to model evolutionary-scale protein sequences, encoding protein biology information in representation. Significant improvements are observed in both token-level and sequence-level tasks, demonstrating that our large-scale model can accurately capture evolution information from pretraining on evolutionary-scale individual sequences. Our code and model are available at https://github.com/THUDM/ProteinLM.

التعلم الآلي الحساب واللغة الجزيئات الحيوية

GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training

113 - Jiezhong Qiu , Qibin Chen , Yuxiao Dong 2020

Graph representation learning has emerged as a powerful technique for addressing real-world problems. Various downstream graph learning tasks have benefited from its recent developments, such as node classification, similarity search, and graph class ification. However, prior arts on graph representation learning focus on domain specific problems and train a dedicated model for each graph dataset, which is usually non-transferable to out-of-domain data. Inspired by the recent advances in pre-training from natural language processing and computer vision, we design Graph Contrastive Coding (GCC) -- a self-supervised graph neural network pre-training framework -- to capture the universal network topological properties across multiple networks. We design GCCs pre-training task as subgraph instance discrimination in and across networks and leverage contrastive learning to empower graph neural networks to learn the intrinsic and transferable structural representations. We conduct extensive experiments on three graph learning tasks and ten graph datasets. The results show that GCC pre-trained on a collection of diverse datasets can achieve competitive or better performance to its task-specific and trained-from-scratch counterparts. This suggests that the pre-training and fine-tuning paradigm presents great potential for graph representation learning.

التعلم الآلي الشبكات الاجتماعية والمعلومات التعلم الالي

The Lifecycle and Cascade of WeChat Social Messaging Groups

38 - Jiezhong Qiu , Yixuan Li , Jie Tang 2015

Social instant messaging services are emerging as a transformative form with which people connect, communicate with friends in their daily life - they catalyze the formation of social groups, and they bring people stronger sense of community and conn ection. However, research community still knows little about the formation and evolution of groups in the context of social messaging - their lifecycles, the change in their underlying structures over time, and the diffusion processes by which they develop new members. In this paper, we analyze the daily usage logs from WeChat group messaging platform - the largest standalone messaging communication service in China - with the goal of understanding the processes by which social messaging groups come together, grow new members, and evolve over time. Specifically, we discover a strong dichotomy among groups in terms of their lifecycle, and develop a separability model by taking into account a broad range of group-level features, showing that long-term and short-term groups are inherently distinct. We also found that the lifecycle of messaging groups is largely dependent on their social roles and functions in users daily social experiences and specific purposes. Given the strong separability between the long-term and short-term groups, we further address the problem concerning the early prediction of successful communities. In addition to modeling the growth and evolution from group-level perspective, we investigate the individual-level attributes of group members and study the diffusion process by which groups gain new members. By considering members historical engagement behavior as well as the local social network structure that they embedded in, we develop a membership cascade model and demonstrate the effectiveness by achieving AUC of 95.31% in predicting inviter, and an AUC of 98.66% in predicting invitee.

الشبكات الاجتماعية والمعلومات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد