تصف هذه الورقة معيارا متاحا بحرية على شبكة الإنترنت يسمى HB DEID.تحدد DED HB ما يسمى بالمعلومات الصحية المحمية، PHI، في نص مكتوب باللغة السويدية والأقنعة أو استبدالها مع بدائل أو سرية.يتم تسمية فيس كيانات مثل الأسماء الشخصية والمواقع والأعمار وأرقام الهواتف والتواريخ.يستخدم HB DEID نموذجا CRF مدرب على النص المشروح غير الحساسة في السويدية، بالإضافة إلى خطوة ما بعد معالجة القواعد لإيجاد فاي.الخطوة الأخيرة في غامضة PHI هي إما قناعها، إظهار اسم الفصل أو استخدام نظام الكشف عن القواعد لاستبداله.
This paper describes a freely available web-based demonstrator called HB Deid. HB Deid identifies so-called protected health information, PHI, in a text written in Swedish and removes, masks, or replaces them with surrogates or pseudonyms. PHIs are named entities such as personal names, locations, ages, phone numbers, dates. HB Deid uses a CRF model trained on non-sensitive annotated text in Swedish, as well as a rule-based post-processing step for finding PHI. The final step in obscuring the PHI is then to either mask it, show only the class name or use a rule-based pseudonymisation system to replace it.
References used
https://aclanthology.org/
De-identification is the task of detecting privacy-related entities in text, such as person names, emails and contact data. It has been well-studied within the medical domain. The need for de-identification technology is increasing, as privacy-preser
Building tools to remove sensitive information such as personal names, addresses, and telephone numbers - so called Protected Health Information (PHI) - from clinical free text is an important task to make clinical texts available for research. These
GECko+ : a Grammatical and Discourse Error Correction Tool We introduce GECko+, a web-based writing assistance tool for English that corrects errors both at the sentence and at the discourse level. It is based on two state-of-the-art models for gramm
Being able to generate accurate word alignments is useful for a variety of tasks. While statistical word aligners can work well, especially when parallel training data are plentiful, multilingual embedding models have recently been shown to give good
This papers presents a platform for monitoring press narratives with respect to several social challenges, including gender equality, migrations and minority languages. As narratives are encoded in natural language, we have to use natural processing