في هذه الورقة، نقترح حل عالمي قابل للتفسير لحل مشكلة NLP البارزة: قرار الكيان (ER). نحن فوركون في وقت متأخر من مشكلة تقسيم الرسم البياني. يتم تمثيل كل إشارة إلى كيان عالمي حقيقي بواسطة عقدة في الرسم البياني، وتستخدم درجات SIM الزوجية بين التفسير في ربط هذه العقد إلى زمرة واحدة بالضبط، والتي تمثل كيان عالمي حقيقي في مجال ER. في هذه الورقة، نستخدم مشكلة تقسيم Clique (CPP)، وهو ما يعد عددا صحيحا (IP) لصياغة ER كقسم رسم بياني، ثم قم بتسلط الضوء على الطبيعة القابلة للتفسير لهذه الطريقة. نظرا لأن CPP هو NP-Hard، نقدم إجراءات حل فعالة، خوارزمية XER، لحل CPP كملكة كيميائية لإيجاد أقصى حد من الزمرات في الرسم البياني ثم أداء التعبئة المعممة المعممة باستخدام صياغة جديدة. نناقش مزايا استخدام XER على الأساليب التقليدية وتوفير الخيارات الحسابية ونتائج تطبيق هذه الطريقة إلى مجموعات بيانات ER.
In this paper, we propose a global, self- explainable solution to solve a prominent NLP problem: Entity Resolution (ER). We formu- late ER as a graph partitioning problem. Every mention of a real-world entity is represented by a node in the graph, and the pairwise sim- ilarity scores between the mentions are used to associate these nodes to exactly one clique, which represents a real-world entity in the ER domain. In this paper, we use Clique Partition- ing Problem (CPP), which is an Integer Pro- gram (IP) to formulate ER as a graph partition- ing problem and then highlight the explainable nature of this method. Since CPP is NP-Hard, we introduce an efficient solution procedure, the xER algorithm, to solve CPP as a combi- nation of finding maximal cliques in the graph and then performing generalized set packing using a novel formulation. We discuss the advantages of using xER over the traditional methods and provide the computational exper- iments and results of applying this method to ER data sets.
References used
https://aclanthology.org/
The embedding-based large-scale query-document retrieval problem is a hot topic in the information retrieval (IR) field. Considering that pre-trained language models like BERT have achieved great success in a wide variety of NLP tasks, we present a Q
Sentence Compression (SC), which aims to shorten sentences while retaining important words that express the essential meanings, has been studied for many years in many languages, especially in English. However, improvements on Chinese SC task are sti
The increasing use of social media sites in countries like India has given rise to large volumes of code-mixed data. Sentiment analysis of this data can provide integral insights into people's perspectives and opinions. Code-mixed data is often noisy
Sarcasm is a linguistic expression often used to communicate the opposite of what is said, usually something that is very unpleasant with an intention to insult or ridicule. Inherent ambiguity in sarcastic expressions makes sarcasm detection very dif
In this research ,we studied the problem of multicollinearity among
independent variables in the multiple regression model this matter
leads to a mistake in one of the essential conditions of the multiple
regression model and getting incorrect res