Do you want to publish a course? Click here

xER: An Explainable Model for Entity Resolution using an Efficient Solution for the Clique Partitioning Problem

XER: نموذج قابل للتفسير لتحليل الكيان باستخدام حل فعال لمشكلة تقسيم الزمرة

234   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

In this paper, we propose a global, self- explainable solution to solve a prominent NLP problem: Entity Resolution (ER). We formu- late ER as a graph partitioning problem. Every mention of a real-world entity is represented by a node in the graph, and the pairwise sim- ilarity scores between the mentions are used to associate these nodes to exactly one clique, which represents a real-world entity in the ER domain. In this paper, we use Clique Partition- ing Problem (CPP), which is an Integer Pro- gram (IP) to formulate ER as a graph partition- ing problem and then highlight the explainable nature of this method. Since CPP is NP-Hard, we introduce an efficient solution procedure, the xER algorithm, to solve CPP as a combi- nation of finding maximal cliques in the graph and then performing generalized set packing using a novel formulation. We discuss the advantages of using xER over the traditional methods and provide the computational exper- iments and results of applying this method to ER data sets.



References used
https://aclanthology.org/
rate research

Read More

The embedding-based large-scale query-document retrieval problem is a hot topic in the information retrieval (IR) field. Considering that pre-trained language models like BERT have achieved great success in a wide variety of NLP tasks, we present a Q uadrupletBERT model for effective and efficient retrieval in this paper. Unlike most existing BERT-style retrieval models, which only focus on the ranking phase in retrieval systems, our model makes considerable improvements to the retrieval phase and leverages the distances between simple negative and hard negative instances to obtaining better embeddings. Experimental results demonstrate that our QuadrupletBERT achieves state-of-the-art results in embedding-based large-scale retrieval tasks.
Sentence Compression (SC), which aims to shorten sentences while retaining important words that express the essential meanings, has been studied for many years in many languages, especially in English. However, improvements on Chinese SC task are sti ll quite few due to several difficulties: scarce of parallel corpora, different segmentation granularity of Chinese sentences, and imperfect performance of syntactic analyses. Furthermore, entire neural Chinese SC models have been under-investigated so far. In this work, we construct an SC dataset of Chinese colloquial sentences from a real-life question answering system in the telecommunication domain, and then, we propose a neural Chinese SC model enhanced with a Self-Organizing Map (SOM-NCSCM), to gain a valuable insight from the data and improve the performance of the whole neural Chinese SC model in a valid manner. Experimental results show that our SOM-NCSCM can significantly benefit from the deep investigation of similarity among data, and achieve a promising F1 score of 89.655 and BLEU4 score of 70.116, which also provides a baseline for further research on Chinese SC task.
The increasing use of social media sites in countries like India has given rise to large volumes of code-mixed data. Sentiment analysis of this data can provide integral insights into people's perspectives and opinions. Code-mixed data is often noisy in nature due to multiple spellings for the same word, lack of definite order of words in a sentence, and random abbreviations. Thus, working with code-mixed data is more challenging than monolingual data. Interpreting a model's predictions allows us to determine the robustness of the model against different forms of noise. In this paper, we propose a methodology to integrate explainable approaches into code-mixed sentiment analysis. By interpreting the predictions of sentiment analysis models we evaluate how well the model is able to adapt to the implicit noises present in code-mixed data.
Sarcasm is a linguistic expression often used to communicate the opposite of what is said, usually something that is very unpleasant with an intention to insult or ridicule. Inherent ambiguity in sarcastic expressions makes sarcasm detection very dif ficult. In this work, we focus on detecting sarcasm in textual conversations, written in English, from various social networking platforms and online media. To this end, we develop an interpretable deep learning model using multi-head self-attention and gated recurrent units. We show the effectiveness and interpretability of our approach by achieving state-of-the-art results on datasets from social networking platforms, online discussion forums, and political dialogues.
In this research ,we studied the problem of multicollinearity among independent variables in the multiple regression model this matter leads to a mistake in one of the essential conditions of the multiple regression model and getting incorrect res ults. At the beginning we have introduced documented theoretical study of the kinds of the multicollinearity and of the reasons of the problem of the multiple regression model and some methods to discover them. In addition to this we mentioned some methods that treat the cases of multiple regression model then we introduced a new method to treat multicollineartiy and apply it to an example . In this method we have dealt with multicollinearity on the hand and solved the problem of discrepancy between the significant of the regression model and the non-significant of one or more coefficient.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا