ﻻ يوجد ملخص باللغة العربية
Real-world data typically contain a large number of features that are often heterogeneous in nature, relevance, and also units of measure. When assessing the similarity between data points, one can build various distance measures using subsets of these features. Using the fewest features but still retaining sufficient information about the system is crucial in many statistical learning approaches, particularly when data are sparse. We introduce a statistical test that can assess the relative information retained when using two different distance measures, and determine if they are equivalent, independent, or if one is more informative than the other. This in turn allows finding the most informative distance measure out of a pool of candidates. The approach is applied to find the most relevant policy variables for controlling the Covid-19 epidemic and to find compact yet informative representations of atomic structures, but its potential applications are wide ranging in many branches of science.
This paper is concerned with the problem of top-$K$ ranking from pairwise comparisons. Given a collection of $n$ items and a few pairwise comparisons across them, one wishes to identify the set of $K$ items that receive the highest ranks. To tackle t
The entropy of a pair of random variables is commonly depicted using a Venn diagram. This representation is potentially misleading, however, since the multivariate mutual information can be negative. This paper presents new measures of multivariate i
This paper is a strongly geometrical approach to the Fisher distance, which is a measure of dissimilarity between two probability distribution functions. The Fisher distance, as well as other divergence measures, are also used in many applications to
Appropriately representing elements in a database so that queries may be accurately matched is a central task in information retrieval; recently, this has been achieved by embedding the graphical structure of the database into a manifold in a hierarc
We study the problem of recovering a hidden community of cardinality $K$ from an $n times n$ symmetric data matrix $A$, where for distinct indices $i,j$, $A_{ij} sim P$ if $i, j$ both belong to the community and $A_{ij} sim Q$ otherwise, for two know