ترغب بنشر مسار تعليمي؟ اضغط هنا

Integrating Information About Entities Progressively

285   0   0.0 ( 0 )
 نشر من قبل Christopher Buss
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Users often have to integrate information about entities from multiple data sources. This task is challenging as each data source may represent information about the same entity in a distinct form, e.g., each data source may use a different name for the same person. Currently, data from different representations are translated into a unified one via lengthy and costly expert attention and tuning. Such methods cannot scale to the rapidly increasing number and variety of available data sources. We demonstrate ProgMap, a entity-matching framework in which data sources learn to collaborate and integrate information about entities on-demand and with minimal expert intervention. The data sources leverage user feedback to improve the accuracy of their collaboration and results. ProgMap also has techniques to reduce the amount of required user feedback to achieve effective matchings.

قيم البحث

اقرأ أيضاً

208 - Rada Chirkova , Ting Yu 2014
We consider the problems of finding and determining certain query answers and of determining containment between queries; each problem is formulated in presence of materialized views and dependencies under the closed-world assumption. We show a tight relationship between the problems in this setting. Further, we introduce algorithms for solving each problem for those inputs where all the queries and views are conjunctive, and the dependencies are embedded weakly acyclic. We also determine the complexity of each problem under the security-relevant complexity measure introduced by Zhang and Mendelzon in 2005. The problems studied in this paper are fundamental in ensuring correct specification of database access-control policies, in particular in case of fine-grained access control. Our approaches can also be applied in the areas of inference control, secure data publishing, and database auditing.
Entity alignment (EA) aims to find equivalent entities in different knowledge graphs (KGs). Current EA approaches suffer from scalability issues, limiting their usage in real-world EA scenarios. To tackle this challenge, we propose LargeEA to align e ntities between large-scale KGs. LargeEA consists of two channels, i.e., structure channel and name channel. For the structure channel, we present METIS-CPS, a memory-saving mini-batch generation strategy, to partition large KGs into smaller mini-batches. LargeEA, designed as a general tool, can adopt any existing EA approach to learn entities structural features within each mini-batch independently. For the name channel, we first introduce NFF, a name feature fusion method, to capture rich name features of entities without involving any complex training process. Then, we exploit a name-based data augmentation to generate seed alignment without any human intervention. Such design fits common real-world scenarios much better, as seed alignment is not always available. Finally, LargeEA derives the EA results by fusing the structural features and name features of entities. Since no widely-acknowledged benchmark is available for large-scale EA evaluation, we also develop a large-scale EA benchmark called DBP1M extracted from real-world KGs. Extensive experiments confirm the superiority of LargeEA against state-of-the-art competitors.
Multilingual knowledge graphs (KGs), such as YAGO and DBpedia, represent entities in different languages. The task of cross-lingual entity alignment is to match entities in a source language with their counterparts in target languages. In this work, we investigate embedding-based approaches to encode entities from multilingual KGs into the same vector space, where equivalent entities are close to each other. Specifically, we apply graph convolutional networks (GCNs) to combine multi-aspect information of entities, including topological connections, relations, and attributes of entities, to learn entity embeddings. To exploit the literal descriptions of entities expressed in different languages, we propose two uses of a pretrained multilingual BERT model to bridge cross-lingual gaps. We further propose two strategies to integrate GCN-based and BERT-based modules to boost performance. Extensive experiments on two benchmark datasets demonstrate that our method significantly outperforms existing systems.
In this paper, we consider networks of deterministic spiking neurons, firing synchronously at discrete times; such spiking neural networks are inspired by networks of neurons and synapses that occur in brains. We consider the problem of translating t emporal information into spatial information in such networks, an important task that is carried out by actual brains. Specifically, we define two problems: First Consecutive Spikes Counting (FCSC) and Total Spikes Counting (TSC), which model spike and rate coding aspects of translating temporal information into spatial information respectively. Assuming an upper bound of $T$ on the length of the temporal input signal, we design two networks that solve these two problems, each using $O(log T)$ neurons and terminating in time $1$. We also prove that there is no network with less than $T$ neurons that solves either question in time $0$.
This paper presents to integrate the auxiliary information (e.g., additional attributes for data such as the hashtags for Instagram images) in the self-supervised learning process. We first observe that the auxiliary information may bring us useful i nformation about data structures: for instance, the Instagram images with the same hashtags can be semantically similar. Hence, to leverage the structural information from the auxiliary information, we present to construct data clusters according to the auxiliary information. Then, we introduce the Clustering InfoNCE (Cl-InfoNCE) objective that learns similar representations for augmented variants of data from the same cluster and dissimilar representations for data from different clusters. Our approach contributes as follows: 1) Comparing to conventional self-supervised representations, the auxiliary-information-infused self-supervised representations bring the performance closer to the supervised representations; 2) The presented Cl-InfoNCE can also work with unsupervised constructed clusters (e.g., k-means clusters) and outperform strong clustering-based self-supervised learning approaches, such as the Prototypical Contrastive Learning (PCL) method; 3) We show that Cl-InfoNCE may be a better approach to leverage the data clustering information, by comparing it to the baseline approach - learning to predict the clustering assignments with cross-entropy loss. For analysis, we connect the goodness of the learned representations with the statistical relationships: i) the mutual information between the labels and the clusters and ii) the conditional entropy of the clusters given the labels.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا