Do you want to publish a course? Click here

Exploring dual information in distance metric learning for clustering

112   0   0.0 ( 0 )
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

Distance metric learning algorithms aim to appropriately measure similarities and distances between data points. In the context of clustering, metric learning is typically applied with the assist of side-information provided by experts, most commonly expressed in the form of cannot-link and must-link constraints. In this setting, distance metric learning algorithms move closer pairs of data points involved in must-link constraints, while pairs of points involved in cannot-link constraints are moved away from each other. For these algorithms to be effective, it is important to use a distance metric that matches the expert knowledge, beliefs, and expectations, and the transformations made to stick to the side-information should preserve geometrical properties of the dataset. Also, it is interesting to filter the constraints provided by the experts to keep only the most useful and reject those that can harm the clustering process. To address these issues, we propose to exploit the dual information associated with the pairwise constraints of the semi-supervised clustering problem. Experiments clearly show that distance metric learning algorithms benefit from integrating this dual information.



rate research

Read More

61 - Panpan Yu , Qingna Li 2019
Image ranking is to rank images based on some known ranked images. In this paper, we propose an improved linear ordinal distance metric learning approach based on the linear distance metric learning model. By decomposing the distance metric $A$ as $L^TL$, the problem can be cast as looking for a linear map between two sets of points in different spaces, meanwhile maintaining some data structures. The ordinal relation of the labels can be maintained via classical multidimensional scaling, a popular tool for dimension reduction in statistics. A least squares fitting term is then introduced to the cost function, which can also maintain the local data structure. The resulting model is an unconstrained problem, and can better fit the data structure. Extensive numerical results demonstrate the improvement of the new approach over the linear distance metric learning model both in speed and ranking performance.
Deep Metric Learning (DML), a widely-used technique, involves learning a distance metric between pairs of samples. DML uses deep neural architectures to learn semantic embeddings of the input, where the distance between similar examples is small while dissimilar ones are far apart. Although the underlying neural networks produce good accuracy on naturally occurring samples, they are vulnerable to adversarially-perturbed samples that reduce performance. We take a first step towards training robust DML models and tackle the primary challenge of the metric losses being dependent on the samples in a mini-batch, unlike standard losses that only depend on the specific input-output pair. We analyze this dependence effect and contribute a robust optimization formulation. Using experiments on three commonly-used DML datasets, we demonstrate 5-76 fold increases in adversarial accuracy, and outperform an existing DML model that sought out to be robust.
70 - Jiawei Zhang 2020
Graph distance metric learning serves as the foundation for many graph learning problems, e.g., graph clustering, graph classification and graph matching. Existing research works on graph distance metric (or graph kernels) learning fail to maintain the basic properties of such metrics, e.g., non-negative, identity of indiscernibles, symmetry and triangle inequality, respectively. In this paper, we will introduce a new graph neural network based distance metric learning approaches, namely GB-DISTANCE (GRAPH-BERT based Neural Distance). Solely based on the attention mechanism, GB-DISTANCE can learn graph instance representations effectively based on a pre-trained GRAPH-BERT model. Different from the existing supervised/unsupervised metrics, GB-DISTANCE can be learned effectively in a semi-supervised manner. In addition, GB-DISTANCE can also maintain the distance metric basic properties mentioned above. Extensive experiments have been done on several benchmark graph datasets, and the results demonstrate that GB-DISTANCE can out-perform the existing baseline methods, especially the recent graph neural network model based graph metrics, with a significant gap in computing the graph distance.
We study a quantum information metric (or fidelity susceptibility) in conformal field theories with respect to a small perturbation by a primary operator. We argue that its gravity dual is approximately given by a volume of maximal time slice in an AdS spacetime when the perturbation is exactly marginal. We confirm our claim in several examples.
We explore trust in a relatively new area of data science: Automated Machine Learning (AutoML). In AutoML, AI methods are used to generate and optimize machine learning models by automatically engineering features, selecting models, and optimizing hyperparameters. In this paper, we seek to understand what kinds of information influence data scientists trust in the models produced by AutoML? We operationalize trust as a willingness to deploy a model produced using automated methods. We report results from three studies -- qualitative interviews, a controlled experiment, and a card-sorting task -- to understand the information needs of data scientists for establishing trust in AutoML systems. We find that including transparency features in an AutoML tool increased user trust and understandability in the tool; and out of all proposed features, model performance metrics and visualizations are the most important information to data scientists when establishing their trust with an AutoML tool.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا