Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

HB Deid - HB De-identification tool demonstrator

HB DEID - HB DE- تحديد أداة المتظاهرين

892 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper describes a freely available web-based demonstrator called HB Deid. HB Deid identifies so-called protected health information, PHI, in a text written in Swedish and removes, masks, or replaces them with surrogates or pseudonyms. PHIs are named entities such as personal names, locations, ages, phone numbers, dates. HB Deid uses a CRF model trained on non-sensitive annotated text in Swedish, as well as a rule-based post-processing step for finding PHI. The final step in obscuring the PHI is then to either mask it, show only the class name or use a rule-based pseudonymisation system to replace it.

References used

https://aclanthology.org/

rate research

De-identification of Privacy-related Entities in Job Postings

538 - Association for Computation Linguistics 2021 مقالة

De-identification is the task of detecting privacy-related entities in text, such as person names, emails and contact data. It has been well-studied within the medical domain. The need for de-identification technology is increasing, as privacy-preser ving data handling is in high demand in many domains. In this paper, we focus on job postings. We present JobStack, a new corpus for de-identification of personal data in job vacancies on Stackoverflow. We introduce baselines, comparing Long-Short Term Memory (LSTM) and Transformer models. To improve these baselines, we experiment with BERT representations, and distantly related auxiliary data via multi-task learning. Our results show that auxiliary data helps to improve de-identification performance. While BERT representations improve performance, surprisingly vanilla'' BERT turned out to be more effective than BERT trained on Stackoverflow-related data.

privacy-related entities detecting privacy-related entities job postings الكيانات المتعلقة بالخصوصية الكشف عن الكيانات المتعلقة بالخصوصية وظائف شاغرة صناعة حمض الفوسفور المزيد..

Creating and Evaluating a Synthetic Norwegian Clinical Corpus for De-Identification

632 - Association for Computation Linguistics 2021 مقالة

Building tools to remove sensitive information such as personal names, addresses, and telephone numbers - so called Protected Health Information (PHI) - from clinical free text is an important task to make clinical texts available for research. These de-identification tools must be assessed regarding their quality in the form of the measurements precision and re- call. To assess such tools, gold standards - annotated clinical text - must be available. Such gold standards exist for larger languages. For Norwegian, how- ever, there are no such resources. Therefore, an already existing Norwegian synthetic clinical corpus, NorSynthClinical, has been extended with PHIs and annotated by two annotators, obtaining an inter-annotator agreement of 0.94 F1-measure. In total, the corpus has 409 annotated PHI instances and is called NorSynthClinical PHI. A de-identification hybrid tool (machine learning and rule-based meth- ods) for Norwegian was developed and trained with open available resources, and obtained an overall F1-measure of 0.73 and a recall of 0.62, when tested using NorSynthClinical PHI. NorSynthClinical PHI is made open and available at Github to be used by the research community.

وظائف شاغرة protected health information called protected health المعلومات الصحية المحمية دعا الصحة المحمية صناعة حمض الفوسفور

GECko+: a Grammatical and Discourse Error Correction Tool

687 - Association for Computation Linguistics 2021 مقالة

GECko+ : a Grammatical and Discourse Error Correction Tool We introduce GECko+, a web-based writing assistance tool for English that corrects errors both at the sentence and at the discourse level. It is based on two state-of-the-art models for gramm ar error correction and sentence ordering. GECko+ is available online as a web application that implements a pipeline combining the two models.

discourse error correction error correction tool خطاب الخطأ تصحيح أداة تصحيح الخطأ تصحيح الاخطاء صناعة حمض الفوسفور

CombAlign: a Tool for Obtaining High-Quality Word Alignments

1114 - Association for Computation Linguistics 2021 مقالة

Being able to generate accurate word alignments is useful for a variety of tasks. While statistical word aligners can work well, especially when parallel training data are plentiful, multilingual embedding models have recently been shown to give good results in unsupervised scenarios. We evaluate an ensemble method for word alignment on four language pairs and demonstrate that by combining multiple tools, taking advantage of their different approaches, substantial gains can be made. This holds for settings ranging from very low-resource to high-resource. Furthermore, we introduce a new gold alignment test set for Icelandic and a new easy-to-use tool for creating manual word alignments.

obtaining high-quality word obtaining high-quality high-quality word alignments الحصول على كلمة عالية الجودة الحصول على جودة عالية محاذاة كلمة عالية الجودة صناعة حمض الفوسفور المزيد..

GEPSA, a tool for monitoring social challenges in digital press

974 - Association for Computation Linguistics 2021 مقالة

This papers presents a platform for monitoring press narratives with respect to several social challenges, including gender equality, migrations and minority languages. As narratives are encoded in natural language, we have to use natural processing techniques to automate their analysis. Thus, crawled news are processed by means of several NLP modules, including named entity recognition, keyword extraction,document classification for social challenge detection, and sentiment analysis. A Flask powered interface provides data visualization for a user-based analysis of the data. This paper presents the architecture of the system and describes in detail its different components. Evaluation is provided for the modules related to extraction and classification of information regarding social challenges.

gepsa social challenges tool for monitoring gepsa. التحديات الاجتماعية أداة للرصد صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

HB Deid - HB De-identification tool demonstrator

HB DEID - HB DE- تحديد أداة المتظاهرين

Ask ChatGPT about the research

Read More

suggested questions