Research papers, master and doctoral theses about Health

Detecting Health Advice in Medical Research Literature

205 - Association for Computation Linguistics 2021 مقالة

Health and medical researchers often give clinical and policy recommendations to inform health practice and public health policy. However, no current health information system supports the direct retrieval of health advice. This study fills the gap b y developing and validating an NLP-based prediction model for identifying health advice in research publications. We annotated a corpus of 6,000 sentences extracted from structured abstracts in PubMed publications as strong advice'', weak advice'', or no advice'', and developed a BERT-based model that can predict, with a macro-averaged F1-score of 0.93, whether a sentence gives strong advice, weak advice, or not. The prediction model generalized well to sentences in both unstructured abstracts and discussion sections, where health advice normally appears. We also conducted a case study that applied this prediction model to retrieve specific health advice on COVID-19 treatments from LitCovid, a large COVID research literature portal, demonstrating the usefulness of retrieving health advice sentences as an advanced research literature navigation function for health researchers and the general public.

health advice advice medical research literature المشورة الصحية النصيحة أدب البحث الطبي صناعة حمض الفوسفور المزيد..

Fighting the COVID-19 Infodemic: Modeling the Perspective of Journalists, Fact-Checkers, Social Media Platforms, Policy Makers, and the Society

231 - Association for Computation Linguistics 2021 مقالة

With the emergence of the COVID-19 pandemic, the political and the medical aspects of disinformation merged as the problem got elevated to a whole new level to become the first global infodemic. Fighting this infodemic has been declared one of the mo st important focus areas of the World Health Organization, with dangers ranging from promoting fake cures, rumors, and conspiracy theories to spreading xenophobia and panic. Addressing the issue requires solving a number of challenging problems such as identifying messages containing claims, determining their check-worthiness and factuality, and their potential to do harm as well as the nature of that harm, to mention just a few. To address this gap, we release a large dataset of 16K manually annotated tweets for fine-grained disinformation analysis that (i) focuses on COVID-19, (ii) combines the perspectives and the interests of journalists, fact-checkers, social media platforms, policy makers, and society, and (iii) covers Arabic, Bulgarian, Dutch, and English. Finally, we show strong evaluation results using pretrained Transformers, thus confirming the practical utility of the dataset in monolingual vs. multilingual, and single task vs. multitask settings.

لغة مختلطة متعددة modeling the perspective world health organization نمذجة المنظور منظمة الصحة العالمية صناعة حمض الفوسفور

Synthetic Data Generation and Multi-Task Learning for Extracting Temporal Information from Health-Related Narrative Text

202 - Association for Computation Linguistics 2021 مقالة

Extracting temporal information is critical to process health-related text. Temporal information extraction is a challenging task for language models because it requires processing both texts and numbers. Moreover, the fundamental challenge is how to obtain a large-scale training dataset. To address this, we propose a synthetic data generation algorithm. Also, we propose a novel multi-task temporal information extraction model and investigate whether multi-task learning can contribute to performance improvement by exploiting additional training signals with the existing training data. For experiments, we collected a custom dataset containing unstructured texts with temporal information of sleep-related activities. Experimental results show that utilising synthetic data can improve the performance when the augmentation factor is 3. The results also show that when multi-task learning is used with an appropriate amount of synthetic data, the performance can significantly improve from 82. to 88.6 and from 83.9 to 91.9 regarding micro-and macro-average exact match scores of normalised time prediction, respectively.

extracting temporal information health-related narrative text temporal information استخراج المعلومات الزمنية النص السردي المرتبط بالصحة المعلومات الزمنية صناعة حمض الفوسفور المزيد..

Evidence-based Fact-Checking of Health-related Claims

213 - Association for Computation Linguistics 2021 مقالة

The task of verifying the truthfulness of claims in textual documents, or fact-checking, has received significant attention in recent years. Many existing evidence-based factchecking datasets contain synthetic claims and the models trained on these d ata might not be able to verify real-world claims. Particularly few studies addressed evidence-based fact-checking of health-related claims that require medical expertise or evidence from the scientific literature. In this paper, we introduce HEALTHVER, a new dataset for evidence-based fact-checking of health-related claims that allows to study the validity of real-world claims by evaluating their truthfulness against scientific articles. Using a three-step data creation method, we first retrieved real-world claims from snippets returned by a search engine for questions about COVID-19. Then we automatically retrieved and re-ranked relevant scientific papers using a T5 relevance-based model. Finally, the relations between each evidence statement and the associated claim were manually annotated as SUPPORT, REFUTE and NEUTRAL. To validate the created dataset of 14,330 evidence-claim pairs, we developed baseline models based on pretrained language models. Our experiments showed that training deep learning models on real-world medical claims greatly improves performance compared to models trained on synthetic and open-domain claims. Our results and manual analysis suggest that HEALTHVER provides a realistic and challenging dataset for future efforts on evidence-based fact-checking of health-related claims. The dataset, source code, and a leaderboard are available at https://github.com/sarrouti/healthver.

fact-checking of health-related health-related claims evidence-based fact-checking فحص الحقائق المتعلقة بالصحة المطالبات المتعلقة بالصحة فحص الحقائق القائمة على الأدلة صناعة حمض الفوسفور المزيد..

Weakly Supervised Extractive Summarization with Attention

231 - Association for Computation Linguistics 2021 مقالة

Automatic summarization aims to extract important information from large amounts of textual data in order to create a shorter version of the original texts while preserving its information. Training traditional extractive summarization models relies heavily on human-engineered labels such as sentence-level annotations of summary-worthiness. However, in many use cases, such human-engineered labels do not exist and manually annotating thousands of documents for the purpose of training models may not be feasible. On the other hand, indirect signals for summarization are often available, such as agent actions for customer service dialogues, headlines for news articles, diagnosis for Electronic Health Records, etc. In this paper, we develop a general framework that generates extractive summarization as a byproduct of supervised learning tasks for indirect signals via the help of attention mechanism. We test our models on customer service dialogues and experimental results demonstrated that our models can reliably select informative sentences and words for automatic summarization.

weakly supervised extractive extractive summarization electronic health records الإشراف ضعيف الاستخراج تلخيص الاستخراج سجلات الصحة الإلكترونية صناعة حمض الفوسفور المزيد..

Creating and Evaluating a Synthetic Norwegian Clinical Corpus for De-Identification

160 - Association for Computation Linguistics 2021 مقالة

Building tools to remove sensitive information such as personal names, addresses, and telephone numbers - so called Protected Health Information (PHI) - from clinical free text is an important task to make clinical texts available for research. These de-identification tools must be assessed regarding their quality in the form of the measurements precision and re- call. To assess such tools, gold standards - annotated clinical text - must be available. Such gold standards exist for larger languages. For Norwegian, how- ever, there are no such resources. Therefore, an already existing Norwegian synthetic clinical corpus, NorSynthClinical, has been extended with PHIs and annotated by two annotators, obtaining an inter-annotator agreement of 0.94 F1-measure. In total, the corpus has 409 annotated PHI instances and is called NorSynthClinical PHI. A de-identification hybrid tool (machine learning and rule-based meth- ods) for Norwegian was developed and trained with open available resources, and obtained an overall F1-measure of 0.73 and a recall of 0.62, when tested using NorSynthClinical PHI. NorSynthClinical PHI is made open and available at Github to be used by the research community.

وظائف شاغرة protected health information called protected health المعلومات الصحية المحمية دعا الصحة المحمية صناعة حمض الفوسفور

Word Embeddings, Cosine Similarity and Deep Learning for Identification of Professions \& Occupations in Health-related Social Media

194 - Association for Computation Linguistics 2021 مقالة

ProfNER-ST focuses on the recognition of professions and occupations from Twitter using Spanish data. Our participation is based on a combination of word-level embeddings, including pre-trained Spanish BERT, as well as cosine similarity computed over a subset of entities that serve as input for an encoder-decoder architecture with attention mechanism. Finally, our best score achieved an F1-measure of 0.823 in the official test set.

health-related social media وسائل الإعلام الاجتماعية ذات الصلة بالصحة صناعة حمض الفوسفور

Approaching SMM4H with auto-regressive language models and back-translation

148 - Association for Computation Linguistics 2021 مقالة

We describe our submissions to the 6th edition of the Social Media Mining for Health Applications (SMM4H) shared task. Our team (OGNLP) participated in the sub-task: Classification of tweets self-reporting potential cases of COVID-19 (Task 5). For ou r submissions, we employed systems based on auto-regressive transformer models (XLNet) and back-translation for balancing the dataset.

التعلم الالي health applications التطبيقات الصحية صناعة حمض الفوسفور

Topic Modeling for Maternal Health Using Reddit

194 - Association for Computation Linguistics 2021 مقالة

This paper applies topic modeling to understand maternal health topics, concerns, and questions expressed in online communities on social networking sites. We examine Latent Dirichlet Analysis (LDA) and two state-of-the-art methods: neural topic mode l with knowledge distillation (KD) and Embedded Topic Model (ETM) on maternal health texts collected from Reddit. The models are evaluated on topic quality and topic inference, using both auto-evaluation metrics and human assessment. We analyze a disconnect between automatic metrics and human evaluations. While LDA performs the best overall with the auto-evaluation metrics NPMI and Coherence, Neural Topic Model with Knowledge Distillation is favorable by expert evaluation. We also create a new partially expert annotated gold-standard maternal health topic

maternal health latent dirichlet analysis الصحه الذهنيه تحليل Dirichlet كامن صناعة حمض الفوسفور

Cluster Analysis of Online Mental Health Discourse using Topic-Infused Deep Contextualized Representations

371 - Association for Computation Linguistics 2021 مقالة

With mental health as a problem domain in NLP, the bulk of contemporary literature revolves around building better mental illness prediction models. The research focusing on the identification of discussion clusters in online mental health communitie s has been relatively limited. Moreover, as the underlying methodologies used in these studies mainly conform to the traditional machine learning models and statistical methods, the scope for introducing contextualized word representations for topic and theme extraction from online mental health communities remains open. Thus, in this research, we propose topic-infused deep contextualized representations, a novel data representation technique that uses autoencoders to combine deep contextual embeddings with topical information, generating robust representations for text clustering. Investigating the Reddit discourse on Post-Traumatic Stress Disorder (PTSD) and Complex Post-Traumatic Stress Disorder (C-PTSD), we elicit the thematic clusters representing the latent topics and themes discussed in the r/ptsd and r/CPTSD subreddits. Furthermore, we also present a qualitative analysis and characterization of each cluster, unraveling the prevalent discourse themes.

online mental health mental health communities الصحة العقلية على الانترنت مجتمعات الصحة العقلية صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد