Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Word Embeddings, Cosine Similarity and Deep Learning for Identification of Professions \& Occupations in Health-related Social Media

Word Embeddings، التشابه الجيبكي والتعلم العميق لتحديد المهن \ والمهن في وسائل التواصل الاجتماعي المرتبط بالصحة

727 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

health-related social media وسائل الإعلام الاجتماعية ذات الصلة بالصحة صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

ProfNER-ST focuses on the recognition of professions and occupations from Twitter using Spanish data. Our participation is based on a combination of word-level embeddings, including pre-trained Spanish BERT, as well as cosine similarity computed over a subset of entities that serve as input for an encoder-decoder architecture with attention mechanism. Finally, our best score achieved an F1-measure of 0.823 in the official test set.

References used

https://aclanthology.org/

rate research

Assessing multiple word embeddings for named entity recognition of professions and occupations in health-related social media

604 - Association for Computation Linguistics 2021 مقالة

This paper presents our contribution to the ProfNER shared task. Our work focused on evaluating different pre-trained word embedding representations suitable for the task. We further explored combinations of embeddings in order to improve the overall results.

UOB في الوكيل assessing multiple word تقييم كلمة متعددة صناعة حمض الفوسفور

Identifying professions \& occupations in Health-related Social Media using Natural Language Processing

562 - Association for Computation Linguistics 2021 مقالة

This paper describes the entry of the research group SINAI at SMM4H's ProfNER task on the identification of professions and occupations in social media related with health. Specifically we have participated in Task 7a: Tweet Binary Classification to determine whether a tweet contains mentions of occupations or not, as well as in Task 7b: NER Offset Detection and Classification aimed at predicting occupations mentions and classify them discriminating by professions and working statuses.

التعلم الالي صناعة حمض الفوسفور

Measuring Biases of Word Embeddings: What Similarity Measures and Descriptive Statistics to Use?

682 - Association for Computation Linguistics 2021 مقالة

Word embeddings are widely used in Natural Language Processing (NLP) for a vast range of applications. However, it has been consistently proven that these embeddings reflect the same human biases that exist in the data used to train them. Most of the introduced bias indicators to reveal word embeddings' bias are average-based indicators based on the cosine similarity measure. In this study, we examine the impacts of different similarity measures as well as other descriptive techniques than averaging in measuring the biases of contextual and non-contextual word embeddings. We show that the extent of revealed biases in word embeddings depends on the descriptive statistics and similarity measures used to measure the bias. We found that over the ten categories of word embedding association tests, Mahalanobis distance reveals the smallest bias, and Euclidean distance reveals the largest bias in word embeddings. In addition, the contextual models reveal less severe biases than the non-contextual word embedding models.

متطلبات المراجعة كلمة embeddings. كلمة صناعة حمض الفوسفور

Synthetic Data Generation and Multi-Task Learning for Extracting Temporal Information from Health-Related Narrative Text

600 - Association for Computation Linguistics 2021 مقالة

Extracting temporal information is critical to process health-related text. Temporal information extraction is a challenging task for language models because it requires processing both texts and numbers. Moreover, the fundamental challenge is how to obtain a large-scale training dataset. To address this, we propose a synthetic data generation algorithm. Also, we propose a novel multi-task temporal information extraction model and investigate whether multi-task learning can contribute to performance improvement by exploiting additional training signals with the existing training data. For experiments, we collected a custom dataset containing unstructured texts with temporal information of sleep-related activities. Experimental results show that utilising synthetic data can improve the performance when the augmentation factor is 3. The results also show that when multi-task learning is used with an appropriate amount of synthetic data, the performance can significantly improve from 82. to 88.6 and from 83.9 to 91.9 regarding micro-and macro-average exact match scores of normalised time prediction, respectively.

extracting temporal information health-related narrative text temporal information استخراج المعلومات الزمنية النص السردي المرتبط بالصحة المعلومات الزمنية صناعة حمض الفوسفور المزيد..

The Early Modern Dutch Mediascape. Detecting Media Mentions in Chronicles Using Word Embeddings and CRF

1118 - Association for Computation Linguistics 2021 مقالة

While the production of information in the European early modern period is a well-researched topic, the question how people were engaging with the information explosion that occurred in early modern Europe, is still underexposed. This paper presents the annotations and experiments aimed at exploring whether we can automatically extract media related information (source, perception, and receiver) from a corpus of early modern Dutch chronicles in order to get insight in the mediascape of early modern middle class people from a historic perspective. In a number of classification experiments with Conditional Random Fields, three categories of features are tested: (i) raw and binary word embedding features, (ii) lexicon features, and (iii) character features. Overall, the classifier that uses raw embeddings performs slightly better. However, given that the best F-scores are around 0.60, we conclude that the machine learning approach needs to be combined with a close reading approach for the results to be useful to answer history research questions.

early modern dutch early modern modern dutch mediascape الهولندية الحديثة المبكرة بداية العصر الوسائط الهولندية الحديثة صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Word Embeddings, Cosine Similarity and Deep Learning for Identification of Professions \& Occupations in Health-related Social Media

Word Embeddings، التشابه الجيبكي والتعلم العميق لتحديد المهن \ والمهن في وسائل التواصل الاجتماعي المرتبط بالصحة

Ask ChatGPT about the research

Read More

suggested questions