أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Zhaochun Ren

Membership Inference Attacks Against Recommender Systems

144 - Minxing Zhang , Zhaochun Ren , Zihan Wang 2021

Recently, recommender systems have achieved promising performances and become one of the most widely used web applications. However, recommender systems are often trained on highly sensitive user data, thus potential data leakage from recommender sys tems may lead to severe privacy problems. In this paper, we make the first attempt on quantifying the privacy leakage of recommender systems through the lens of membership inference. In contrast with traditional membership inference against machine learning classifiers, our attack faces two main differences. First, our attack is on the user-level but not on the data sample-level. Second, the adversary can only observe the ordered recommended items from a recommender system instead of prediction results in the form of posterior probabilities. To address the above challenges, we propose a novel method by representing users from relevant items. Moreover, a shadow recommender is established to derive the labeled training data for training the attack model. Extensive experimental results show that our attack framework achieves a strong performance. In addition, we design a defense mechanism to effectively mitigate the membership inference threat of recommender systems.

التشفير والأمن التعلم الآلي

Cross-Domain Contract Element Extraction with a Bi-directional Feedback Clause-Element Relation Network

140 - Zihan Wang , Hongye Song , Zhaochun Ren 2021

Contract element extraction (CEE) is the novel task of automatically identifying and extracting legally relevant elements such as contract dates, payments, and legislation references from contracts. Automatic methods for this task view it as a sequen ce labeling problem and dramatically reduce human labor. However, as contract genres and element types may vary widely, a significant challenge for this sequence labeling task is how to transfer knowledge from one domain to another, i.e., cross-domain CEE. Cross-domain CEE differs from cross-domain named entity recognition (NER) in two important ways. First, contract elements are far more fine-grained than named entities, which hinders the transfer of extractors. Second, the extraction zones for cross-domain CEE are much larger than for cross-domain NER. As a result, the contexts of elements from different domains can be more diverse. We propose a framework, the Bi-directional Feedback cLause-Element relaTion network (Bi-FLEET), for the cross-domain CEE task that addresses the above challenges. Bi-FLEET has three main components: (1) a context encoder, (2) a clause-element relation encoder, and (3) an inference layer. To incorporate invariant knowledge about element and clause types, a clause-element graph is constructed across domains and a hierarchical graph neural network is adopted in the clause-element relation encoder. To reduce the influence of context variations, a multi-task framework with a bi-directional feedback scheme is designed in the inference layer, conducting both clause classification and element extraction. The experimental results over both cross-domain NER and CEE tasks show that Bi-FLEET significantly outperforms state-of-the-art baselines.

استرجاع المعلومات الحساب واللغة

Semi-Supervised Variational Reasoning for Medical Dialogue Generation

101 - Dongdong Li , Zhaochun Ren , Pengjie Ren 2021

Medical dialogue generation aims to provide automatic and accurate responses to assist physicians to obtain diagnosis and treatment suggestions in an efficient manner. In medical dialogues two key characteristics are relevant for response generation: patient states (such as symptoms, medication) and physician actions (such as diagnosis, treatments). In medical scenarios large-scale human annotations are usually not available, due to the high costs and privacy requirements. Hence, current approaches to medical dialogue generation typically do not explicitly account for patient states and physician actions, and focus on implicit representation instead. We propose an end-to-end variational reasoning approach to medical dialogue generation. To be able to deal with a limited amount of labeled data, we introduce both patient state and physician action as latent variables with categorical priors for explicit patient state tracking and physician policy learning, respectively. We propose a variational Bayesian generative approach to approximate posterior distributions over patient states and physician actions. We use an efficient stochastic gradient variational Bayes estimator to optimize the derived evidence lower bound, where a 2-stage collapsed inference method is proposed to reduce the bias during model training. A physician policy network composed of an action-classifier and two reasoning detectors is proposed for augmented reasoning ability. We conduct experiments on three datasets collected from medical platforms. Our experimental results show that the proposed method outperforms state-of-the-art baselines in terms of objective and subjective evaluation metrics. Our experiments also indicate that our proposed semi-supervised reasoning method achieves a comparable performance as state-of-the-art fully supervised learning baselines for physician policy learning.

الحساب واللغة الذكاء الاصطناعي

Meaningful Answer Generation of E-Commerce Question-Answering

284 - Shen Gao , Xiuying Chen , Zhaochun Ren 2020

In e-commerce portals, generating answers for product-related questions has become a crucial task. In this paper, we focus on the task of product-aware answer generation, which learns to generate an accurate and complete answer from large-scale unlab eled e-commerce reviews and product attributes. However, safe answer problems pose significant challenges to text generation tasks, and e-commerce question-answering task is no exception. To generate more meaningful answers, in this paper, we propose a novel generative neural model, called the Meaningful Product Answer Generator (MPAG), which alleviates the safe answer problem by taking product reviews, product attributes, and a prototype answer into consideration. Product reviews and product attributes are used to provide meaningful content, while the prototype answer can yield a more diverse answer pattern. To this end, we propose a novel answer generator with a review reasoning module and a prototype answer reader. Our key idea is to obtain the correct question-aware information from a large scale collection of reviews and learn how to write a coherent and meaningful answer from an existing prototype answer. To be more specific, we propose a read-and-write memory consisting of selective writing units to conduct reasoning among these reviews. We then employ a prototype reader consisting of comprehensive matching to extract the answer skeleton from the prototype answer. Finally, we propose an answer editor to generate the final answer by taking the question and the above parts as input. Conducted on a real-world dataset collected from an e-commerce platform, extensive experimental results show that our model achieves state-of-the-art performance in terms of both automatic metrics and human evaluations. Human evaluation also demonstrates that our model can consistently generate specific and proper answers.

الحساب واللغة الذكاء الاصطناعي استرجاع المعلومات

From Standard Summarization to New Tasks and Beyond: Summarization with Manifold Information

109 - Shen Gao , Xiuying Chen , Zhaochun Ren 2020

Text summarization is the research area aiming at creating a short and condensed version of the original document, which conveys the main idea of the document in a few words. This research topic has started to attract the attention of a large communi ty of researchers, and it is nowadays counted as one of the most promising research areas. In general, text summarization algorithms aim at using a plain text document as input and then output a summary. However, in real-world applications, most of the data is not in a plain text format. Instead, there is much manifold information to be summarized, such as the summary for a web page based on a query in the search engine, extreme long document (e.g., academic paper), dialog history and so on. In this paper, we focus on the survey of these new summarization tasks and approaches in the real-world application.

الحساب واللغة استرجاع المعلومات

Conversations with Search Engines: SERP-based Conversational Response Generation

114 - Pengjie Ren , Zhumin Chen , Zhaochun Ren 2020

In this paper, we address the problem of answering complex information needs by conversing conversations with search engines, in the sense that users can express their queries in natural language, and directly receivethe information they need from a short system response in a conversational manner. Recently, there have been some attempts towards a similar goal, e.g., studies on Conversational Agents (CAs) and Conversational Search (CS). However, they either do not address complex information needs, or they are limited to the development of conceptual frameworks and/or laboratory-based user studies. We pursue two goals in this paper: (1) the creation of a suitable dataset, the Search as a Conversation (SaaC) dataset, for the development of pipelines for conversations with search engines, and (2) the development of astate-of-the-art pipeline for conversations with search engines, the Conversations with Search Engines (CaSE), using this dataset. SaaC is built based on a multi-turn conversational search dataset, where we further employ workers from a crowdsourcing platform to summarize each relevant passage into a short, conversational response. CaSE enhances the state-of-the-art by introducing a supporting token identification module and aprior-aware pointer generator, which enables us to generate more accurate responses. We carry out experiments to show that CaSE is able to outperform strong baselines. We also conduct extensive analyses on the SaaC dataset to show where there is room for further improvement beyond CaSE. Finally, we release the SaaC dataset and the code for CaSE and all models used for comparison to facilitate future research on this topic.

استرجاع المعلومات الذكاء الاصطناعي

Preserving Local and Global Information for Network Embedding

89 - Yao Ma 2017

Networks such as social networks, airplane networks, and citation networks are ubiquitous. The adjacency matrix is often adopted to represent a network, which is usually high dimensional and sparse. However, to apply advanced machine learning algorit hms to network data, low-dimensional and continuous representations are desired. To achieve this goal, many network embedding methods have been proposed recently. The majority of existing methods facilitate the local information i.e. local connections between nodes, to learn the representations, while completely neglecting global information (or node status), which has been proven to boost numerous network mining tasks such as link prediction and social recommendation. Hence, it also has potential to advance network embedding. In this paper, we study the problem of preserving local and global information for network embedding. In particular, we introduce an approach to capture global information and propose a network embedding framework LOG, which can coherently model {bf LO}cal and {bf G}lobal information. Experimental results demonstrate the ability to preserve global information of the proposed framework. Further experiments are conducted to demonstrate the effectiveness of learned representations of the proposed framework.

الشبكات الاجتماعية والمعلومات الفيزياء والمجتمع

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد