New community

Subscribe to the gold package and get unlimited access to Shamra Academy

De-identification of Privacy-related Entities in Job Postings

إلغاء تحديد الكيانات المتعلقة بالخصوصية في منشورات الوظائف

263 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

privacy-related entities detecting privacy-related entities job postings الكيانات المتعلقة بالخصوصية الكشف عن الكيانات المتعلقة بالخصوصية وظائف شاغرة صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

De-identification is the task of detecting privacy-related entities in text, such as person names, emails and contact data. It has been well-studied within the medical domain. The need for de-identification technology is increasing, as privacy-preserving data handling is in high demand in many domains. In this paper, we focus on job postings. We present JobStack, a new corpus for de-identification of personal data in job vacancies on Stackoverflow. We introduce baselines, comparing Long-Short Term Memory (LSTM) and Transformer models. To improve these baselines, we experiment with BERT representations, and distantly related auxiliary data via multi-task learning. Our results show that auxiliary data helps to improve de-identification performance. While BERT representations improve performance, surprisingly vanilla'' BERT turned out to be more effective than BERT trained on Stackoverflow-related data.

References used

https://aclanthology.org/

rate research

HB Deid - HB De-identification tool demonstrator

604 - Association for Computation Linguistics 2021 مقالة

This paper describes a freely available web-based demonstrator called HB Deid. HB Deid identifies so-called protected health information, PHI, in a text written in Swedish and removes, masks, or replaces them with surrogates or pseudonyms. PHIs are n amed entities such as personal names, locations, ages, phone numbers, dates. HB Deid uses a CRF model trained on non-sensitive annotated text in Swedish, as well as a rule-based post-processing step for finding PHI. The final step in obscuring the PHI is then to either mask it, show only the class name or use a rule-based pseudonymisation system to replace it.

de-identification tool demonstrator de-identification tool deid أداة تعريف لتحديد الهوية أداة تحديد الهوية كد صناعة حمض الفوسفور المزيد..

Creating and Evaluating a Synthetic Norwegian Clinical Corpus for De-Identification

250 - Association for Computation Linguistics 2021 مقالة

Building tools to remove sensitive information such as personal names, addresses, and telephone numbers - so called Protected Health Information (PHI) - from clinical free text is an important task to make clinical texts available for research. These de-identification tools must be assessed regarding their quality in the form of the measurements precision and re- call. To assess such tools, gold standards - annotated clinical text - must be available. Such gold standards exist for larger languages. For Norwegian, how- ever, there are no such resources. Therefore, an already existing Norwegian synthetic clinical corpus, NorSynthClinical, has been extended with PHIs and annotated by two annotators, obtaining an inter-annotator agreement of 0.94 F1-measure. In total, the corpus has 409 annotated PHI instances and is called NorSynthClinical PHI. A de-identification hybrid tool (machine learning and rule-based meth- ods) for Norwegian was developed and trained with open available resources, and obtained an overall F1-measure of 0.73 and a recall of 0.62, when tested using NorSynthClinical PHI. NorSynthClinical PHI is made open and available at Github to be used by the research community.

وظائف شاغرة protected health information called protected health المعلومات الصحية المحمية دعا الصحة المحمية صناعة حمض الفوسفور

Adapting Entities across Languages and Cultures

565 - Association for Computation Linguistics 2021 مقالة

How would you explain Bill Gates to a German? He is associated with founding a company in the United States, so perhaps the German founder Carl Benz could stand in for Gates in those contexts. This type of translation is called adaptation in the tran slation community. Until now, this task has not been done computationally. Automatic adaptation could be used in natural language processing for machine translation and indirectly for generating new question answering datasets and education. We propose two automatic methods and compare them to human results for this novel NLP task. First, a structured knowledge base adapts named entities using their shared properties. Second, vector-arithmetic and orthogonal embedding mappings methods identify better candidates, but at the expense of interpretable features. We evaluate our methods through a new dataset of human adaptations.

explain bill gates languages and cultures cultures اشرح بيل غيتس اللغات والثقافات الثقافات صناعة حمض الفوسفور المزيد..

Improving Neural Language Processing with Named Entities

336 - Association for Computation Linguistics 2021 مقالة

Pretraining-based neural network models have demonstrated state-of-the-art (SOTA) performances on natural language processing (NLP) tasks. The most frequently used sentence representation for neural-based NLP methods is a sequence of subwords that is different from the sentence representation of non-neural methods that are created using basic NLP technologies, such as part-of-speech (POS) tagging, named entity (NE) recognition, and parsing. Most neural-based NLP models receive only vectors encoded from a sequence of subwords obtained from an input text. However, basic NLP information, such as POS tags, NEs, parsing results, etc, cannot be obtained explicitly from only the large unlabeled text used in pretraining-based models. This paper explores use of NEs on two Japanese tasks; document classification and headline generation using Transformer-based models, to reveal the effectiveness of basic NLP information. The experimental results with eight basic NEs and approximately 200 extended NEs show that NEs improve accuracy although a large pretraining-based model trained using 70 GB text data was used.

improving neural language neural language processing تحسين اللغة العصبية معالجة اللغة العصبية صناعة حمض الفوسفور

Organizational values related to management Manner and its impact on job performance of employees: Field study on private hospitals in Lattakia Governorate

2063 - Aِl-Baath University 2017 ورقة بحثية

The aim of the study was to study the effect of organizational values related to management (strength, elite, reward) on the performance of employees in private hospitals in Lattakia Governorate.

Performance الأداء الوظيفي private hospitals القيم التنظيمية المشافي الخاصة Organizational values

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

De-identification of Privacy-related Entities in Job Postings

إلغاء تحديد الكيانات المتعلقة بالخصوصية في منشورات الوظائف

Ask ChatGPT about the research

Read More

suggested questions