No Arabic abstract
Feature selection is a pattern recognition approach to choose important variables according to some criteria to distinguish or explain certain phenomena. There are many genomic and proteomic applications which rely on feature selection to answer questions such as: selecting signature genes which are informative about some biological state, e.g. normal tissues and several types of cancer; or defining a network of prediction or inference among elements such as genes, proteins, external stimuli and other elements of interest. In these applications, a recurrent problem is the lack of samples to perform an adequate estimate of the joint probabilities between element states. A myriad of feature selection algorithms and criterion functions are proposed, although it is difficult to point the best solution in general. The intent of this work is to provide an open-source multiplataform graphical environment to apply, test and compare many feature selection approaches suitable to be used in bioinformatics problems.
The vast majority of Dimensionality Reduction (DR) techniques rely on second-order statistics to define their optimization objective. Even though this provides adequate results in most cases, it comes with several shortcomings. The methods require carefully designed regularizers and they are usually prone to outliers. In this work, a new DR framework, that can directly model the target distribution using the notion of similarity instead of distance, is introduced. The proposed framework, called Similarity Embedding Framework, can overcome the aforementioned limitations and provides a conceptually simpler way to express optimization targets similar to existing DR techniques. Deriving a new DR technique using the Similarity Embedding Framework becomes simply a matter of choosing an appropriate target similarity matrix. A variety of classical tasks, such as performing supervised dimensionality reduction and providing out-of-of-sample extensions, as well as, new novel techniques, such as providing fast linear embeddings for complex techniques, are demonstrated in this paper using the proposed framework. Six datasets from a diverse range of domains are used to evaluate the proposed method and it is demonstrated that it can outperform many existing DR techniques.
We present a system to convert any set of images (e.g., a video clip or a photo album) into a storyboard. We aim to create multiple pleasing graphic representations of the content at interactive rates, so the user can explore and find the storyboard (images, layout, and stylization) that best suits their needs and taste. The main challenges of this work are: selecting the content images, placing them into panels, and applying a stylization. For the latter, we propose an interactive design tool to create new stylizations using a wide range of filter blocks. This approach unleashes the creativity by allowing the user to tune, modify, and intuitively design new sequences of filters. In parallel to this manual design, we propose a novel procedural approach that automatically assembles sequences of filters for innovative results. We aim to keep the algorithm complexity as low as possible such that it can run interactively on a mobile device. Our results include examples of styles designed using both our interactive and procedural tools, as well as their final composition into interesting and appealing storyboards.
Grassmann manifolds have been widely used to represent the geometry of feature spaces in a variety of problems in medical imaging and computer vision including but not limited to shape analysis, action recognition, subspace clustering and motion segmentation. For these problems, the features usually lie in a very high-dimensional Grassmann manifold and hence an appropriate dimensionality reduction technique is called for in order to curtail the computational burden. To this end, the Principal Geodesic Analysis (PGA), a nonlinear extension of the well known principal component analysis, is applicable as a general tool to many Riemannian manifolds. In this paper, we propose a novel framework for dimensionality reduction of data in Riemannian homogeneous spaces and then focus on the Grassman manifold which is an example of a homogeneous space. Our framework explicitly exploits the geometry of the homogeneous space yielding reduced dimensional nested sub-manifolds that need not be geodesic submanifolds and thus are more expressive. Specifically, we project points in a Grassmann manifold to an embedded lower dimensional Grassmann manifold. A salient feature of our method is that it leads to higher expressed variance compared to PGA which we demonstrate via synthetic and real data experiments.
This paper proposes a generalized framework with joint normalization which learns lower-dimensional subspaces with maximum discriminative power by making use of the Riemannian geometry. In particular, we model the similarity/dissimilarity between subspaces using various metrics defined on Grassmannian and formulate dimen-sionality reduction as a non-linear constraint optimization problem considering the orthogonalization. To obtain the linear mapping, we derive the components required to per-form Riemannian optimization (e.g., Riemannian conju-gate gradient) from the original Grassmannian through an orthonormal projection. We respect the Riemannian ge-ometry of the Grassmann manifold and search for this projection directly from one Grassmann manifold to an-other face-to-face without any additional transformations. In this natural geometry-aware way, any metric on the Grassmann manifold can be resided in our model theoreti-cally. We have combined five metrics with our model and the learning process can be treated as an unconstrained optimization problem on a Grassmann manifold. Exper-iments on several datasets demonstrate that our approach leads to a significant accuracy gain over state-of-the-art methods.
This report concerns the problem of dimensionality reduction through information geometric methods on statistical manifolds. While there has been considerable work recently presented regarding dimensionality reduction for the purposes of learning tasks such as classification, clustering, and visualization, these methods have focused primarily on Riemannian manifolds in Euclidean space. While sufficient for many applications, there are many high-dimensional signals which have no straightforward and meaningful Euclidean representation. In these cases, signals may be more appropriately represented as a realization of some distribution lying on a statistical manifold, or a manifold of probability density functions (PDFs). We present a framework for dimensionality reduction that uses information geometry for both statistical manifold reconstruction as well as dimensionality reduction in the data domain.