New community

Subscribe to the gold package and get unlimited access to Shamra Academy

An Investigation towards Differentially Private Sequence Tagging in a Federated Framework

تحقيقا نحو تسلسل خاص بالتفاضل العلاجي في إطار ميديري

262 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

investigation towards differentially differentially private sequence differentially private التحقيق نحو الفرق تسلسل خاص بالتفاضل خاصة بشكل مختلف صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

لبناء التطبيقات المستندة إلى التعلم في الآلات من أجل المجالات الحساسة مثل الطبية والقانونية، وما إلى ذلك حيث يحتوي النص الرقمي على معلومات خاصة، فإن عدم الكشف عن هويت النص مطلوب للحفاظ على الخصوصية. تسلسل العلامات، على سبيل المثال كما فعلت في التعرف على الكيان المسمى (NER) يمكن أن تساعد في الكشف عن المعلومات الخاصة. ومع ذلك، لتدريب نماذج العلامات على التسلسل، مبلغ كافية من البيانات المسمى مطلوبة ولكن بالنسبة لمجالات حساسة الخصوصية، لا يمكن أيضا مشاركة هذه البيانات المسمى مباشرة. في هذه الورقة، يمكننا التحقيق في تطبيق إطار الحفاظ على الخصوصية لمهام علامات التسلسل، وتحديدا NER. وبالتالي، فإننا نحلل إطارا لمهمة NER، التي تتضمن مستويين لحماية الخصوصية. أولا، نقوم بنشر إطار تعليمي (FLF) الموحد حيث لا يتم مشاركة البيانات المسمى مع الخادم المركزي بالإضافة إلى عملاء الأقران. ثانيا، نطبق الخصوصية التفاضلية (DP) أثناء التدريب النماذج في كل مثيل عميل. في حين أن كلا من تدابير الخصوصية مناسبة للنماذج التي تدرك الخصوصية، فإن تركيبة النتائج في النماذج غير المستقرة. لمعرفةنا، هذه هي الدراسة الأولى من نوعها على نماذج علامات تسلسل الإدراك في الخصوصية.

To build machine learning-based applications for sensitive domains like medical, legal, etc. where the digitized text contains private information, anonymization of text is required for preserving privacy. Sequence tagging, e.g. as done in Named Entity Recognition (NER) can help to detect private information. However, to train sequence tagging models, a sufficient amount of labeled data are required but for privacy-sensitive domains, such labeled data also can not be shared directly. In this paper, we investigate the applicability of a privacy-preserving framework for sequence tagging tasks, specifically NER. Hence, we analyze a framework for the NER task, which incorporates two levels of privacy protection. Firstly, we deploy a federated learning (FL) framework where the labeled data are not shared with the centralized server as well as the peer clients. Secondly, we apply differential privacy (DP) while the models are being trained in each client instance. While both privacy measures are suitable for privacy-aware models, their combination results in unstable models. To our knowledge, this is the first study of its kind on privacy-aware sequence tagging models.

References used

https://aclanthology.org/

rate research

ER-AE: Differentially Private Text Generation for Authorship Anonymization

374 - Association for Computation Linguistics 2021 مقالة

Most of privacy protection studies for textual data focus on removing explicit sensitive identifiers. However, personal writing style, as a strong indicator of the authorship, is often neglected. Recent studies, such as SynTF, have shown promising re sults on privacy-preserving text mining. However, their anonymization algorithm can only output numeric term vectors which are difficult for the recipients to interpret. We propose a novel text generation model with a two-set exponential mechanism for authorship anonymization. By augmenting the semantic information through a REINFORCE training reward function, the model can generate differentially private text that has a close semantic and similar grammatical structure to the original text while removing personal traits of the writing style. It does not assume any conditioned labels or paralleled text data for training. We evaluate the performance of the proposed model on the real-life peer reviews dataset and the Yelp review dataset. The result suggests that our model outperforms the state-of-the-art on semantic preservation, authorship obfuscation, and stylometric transformation.

differentially private text private text generation النص الخاص بالتفاضل جيل النص الخاص صناعة حمض الفوسفور

TransPrompt: Towards an Automatic Transferable Prompting Framework for Few-shot Text Classification

362 - Association for Computation Linguistics 2021 مقالة

Recent studies have shown that prompts improve the performance of large pre-trained language models for few-shot text classification. Yet, it is unclear how the prompting knowledge can be transferred across similar NLP tasks for the purpose of mutual reinforcement. Based on continuous prompt embeddings, we propose TransPrompt, a transferable prompting framework for few-shot learning across similar tasks. In TransPrompt, we employ a multi-task meta-knowledge acquisition procedure to train a meta-learner that captures cross-task transferable knowledge. Two de-biasing techniques are further designed to make it more task-agnostic and unbiased towards any tasks. After that, the meta-learner can be adapted to target tasks with high accuracy. Extensive experiments show that TransPrompt outperforms single-task and cross-task strong baselines over multiple NLP tasks and datasets. We further show that the meta-learner can effectively improve the performance on previously unseen tasks; and TransPrompt also outperforms strong fine-tuning baselines when learning with full training sets.

شبكة مزدوجة متزامن automatic transferable prompting التحويل التلقائي المطالبة صناعة حمض الفوسفور

A Secure and Efficient Federated Learning Framework for NLP

332 - Association for Computation Linguistics 2021 مقالة

In this work, we consider the problem of designing secure and efficient federated learning (FL) frameworks for NLP. Existing solutions under this literature either consider a trusted aggregator or require heavy-weight cryptographic primitives, which makes the performance significantly degraded. Moreover, many existing secure FL designs work only under the restrictive assumption that none of the clients can be dropped out from the training protocol. To tackle these problems, we propose SEFL, a secure and efficient federated learning framework that (1) eliminates the need for the trusted entities; (2) achieves similar and even better model accuracy compared with existing FL designs; (3) is resilient to client dropouts.

طبقة المتوسطة efficient federated كفاءة الموحدة صناعة حمض الفوسفور

RAST: Domain-Robust Dialogue Rewriting as Sequence Tagging

317 - Association for Computation Linguistics 2021 مقالة

The task of dialogue rewriting aims to reconstruct the latest dialogue utterance by copying the missing content from the dialogue context. Until now, the existing models for this task suffer from the robustness issue, i.e., performances drop dramatic ally when testing on a different dataset. We address this robustness issue by proposing a novel sequence-tagging-based model so that the search space is significantly reduced, yet the core of this task is still well covered. As a common issue of most tagging models for text generation, the model's outputs may lack fluency. To alleviate this issue, we inject the loss signal from BLEU or GPT-2 under a REINFORCE framework. Experiments show huge improvements of our model over the current state-of-the-art systems when transferring to another dataset.

domain-robust dialogue rewriting dialogue rewriting aims dialogue rewriting إعادة كتابة الحوار القوي إعادة كتابة الحوار أهداف إعادة كتابة الحوار صناعة حمض الفوسفور المزيد..

SpanAlign: Efficient Sequence Tagging Annotation Projection into Translated Data applied to Cross-Lingual Opinion Mining

342 - Association for Computation Linguistics 2021 مقالة

Following the increasing performance of neural machine translation systems, the paradigm of using automatically translated data for cross-lingual adaptation is now studied in several applicative domains. The capacity to accurately project annotations remains however an issue for sequence tagging tasks where annotation must be projected with correct spans. Additionally, when the task implies noisy user-generated text, the quality of translation and annotation projection can be affected. In this paper we propose to tackle multilingual sequence tagging with a new span alignment method and apply it to opinion target extraction from customer reviews. We show that provided suitable heuristics, translated data with automatic span-level annotation projection can yield improvements both for cross-lingual adaptation compared to zero-shot transfer, and data augmentation compared to a multilingual baseline.

efficient sequence tagging cross-lingual opinion mining translated data applied تسلسل تسلسل فعال التعدين الرأي عبر اللغات البيانات المترجمة تطبيقها صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

An Investigation towards Differentially Private Sequence Tagging in a Federated Framework

تحقيقا نحو تسلسل خاص بالتفاضل العلاجي في إطار ميديري

Ask ChatGPT about the research

Read More

suggested questions