ﻻ يوجد ملخص باللغة العربية
The vast majority of Dimensionality Reduction (DR) techniques rely on second-order statistics to define their optimization objective. Even though this provides adequate results in most cases, it comes with several shortcomings. The methods require carefully designed regularizers and they are usually prone to outliers. In this work, a new DR framework, that can directly model the target distribution using the notion of similarity instead of distance, is introduced. The proposed framework, called Similarity Embedding Framework, can overcome the aforementioned limitations and provides a conceptually simpler way to express optimization targets similar to existing DR techniques. Deriving a new DR technique using the Similarity Embedding Framework becomes simply a matter of choosing an appropriate target similarity matrix. A variety of classical tasks, such as performing supervised dimensionality reduction and providing out-of-of-sample extensions, as well as, new novel techniques, such as providing fast linear embeddings for complex techniques, are demonstrated in this paper using the proposed framework. Six datasets from a diverse range of domains are used to evaluate the proposed method and it is demonstrated that it can outperform many existing DR techniques.
Word embeddings have become the basic building blocks for several natural language processing and information retrieval tasks. Pre-trained word embeddings are used in several downstream applications as well as for constructing representations for sen
VARCLUST algorithm is proposed for clustering variables under the assumption that variables in a given cluster are linear combinations of a small number of hidden latent variables, corrupted by the random noise. The entire clustering task is viewed a
Feature selection is a pattern recognition approach to choose important variables according to some criteria to distinguish or explain certain phenomena. There are many genomic and proteomic applications which rely on feature selection to answer ques
Existing dimensionality reduction methods are adept at revealing hidden underlying manifolds arising from high-dimensional data and thereby producing a low-dimensional representation. However, the smoothness of the manifolds produced by classic techn
This paper describes a technique to compare large text sources using word vector representations (word2vec) and dimensionality reduction (t-SNE) and how it can be implemented using Python. The technique provides a birds-eye view of text sources, e.g.