A Discriminative Vectorial Framework for Multi-modal Feature Representation

174 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Lei Gao

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Lei Gao - - Ling Guan

التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Due to the rapid advancements of sensory and computing technology, multi-modal data sources that represent the same pattern or phenomenon have attracted growing attention. As a result, finding means to explore useful information from these multi-modal data sources has quickly become a necessity. In this paper, a discriminative vectorial framework is proposed for multi-modal feature representation in knowledge discovery by employing multi-modal hashing (MH) and discriminative correlation maximization (DCM) analysis. Specifically, the proposed framework is capable of minimizing the semantic similarity among different modalities by MH and exacting intrinsic discriminative representations across multiple data sources by DCM analysis jointly, enabling a novel vectorial framework of multi-modal feature representation. Moreover, the proposed feature representation strategy is analyzed and further optimized based on canonical and non-canonical cases, respectively. Consequently, the generated feature representation leads to effective utilization of the input data sources of high quality, producing improved, sometimes quite impressive, results in various applications. The effectiveness and generality of the proposed framework are demonstrated by utilizing classical features and deep neural network (DNN) based features with applications to image and multimedia analysis and recognition tasks, including data visualization, face recognition, object recognition; cross-modal (text-image) recognition and audio emotion recognition. Experimental results show that the proposed solutions are superior to state-of-the-art statistical machine learning (SML) and DNN algorithms.

قيم البحث

124 - Jie Xu , Yazhou Ren , Huayi Tang 2021

Multi-view clustering is an important research topic due to its capability to utilize complementary information from multiple views. However, there are few methods to consider the negative impact caused by certain views with unclear clustering struct ures, resulting in poor multi-view clustering performance. To address this drawback, we propose self-supervised discriminative feature learning for deep multi-view clustering (SDMVC). Concretely, deep autoencoders are applied to learn embedded features for each view independently. To leverage the multi-view complementary information, we concatenate all views embedded features to form the global features, which can overcome the negative impact of some views unclear clustering structures. In a self-supervised manner, pseudo-labels are obtained to build a unified target distribution to perform multi-view discriminative feature learning. During this process, global discriminative information can be mined to supervise all views to learn more discriminative features, which in turn are used to update the target distribution. Besides, this unified target distribution can make SDMVC learn consistent cluster assignments, which accomplishes the clustering consistency of multiple views while preserving their features diversity. Experiments on various types of multi-view datasets show that SDMVC achieves state-of-the-art performance.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط

GESF: A Universal Discriminative Mapping Mechanism for Graph Representation Learning

99 - Shupeng Gui 2018

Graph embedding is a central problem in social network analysis and many other applications, aiming to learn the vector representation for each node. While most existing approaches need to specify the neighborhood and the dependence form to the neigh borhood, which may significantly degrades the flexibility of representation, we propose a novel graph node embedding method (namely GESF) via the set function technique. Our method can 1) learn an arbitrary form of representation function from neighborhood, 2) automatically decide the significance of neighbors at different distances, and 3) be applied to heterogeneous graph embedding, which may contain multiple types of nodes. Theoretical guarantee for the representation capability of our method has been proved for general homogeneous and heterogeneous graphs and evaluation results on benchmark data sets show that the proposed GESF outperforms the state-of-the-art approaches on producing node vectors for classification tasks.

التعلم الآلي التعلم الالي

A New Modal Autoencoder for Functionally Independent Feature Extraction

60 - Yuzhu Guo , Kang Pan , Simeng Li 2020

Autoencoders have been widely used for dimensional reduction and feature extraction. Various types of autoencoders have been proposed by introducing regularization terms. Most of these regularizations improve representation learning by constraining t he weights in the encoder part, which maps input into hidden nodes and affects the generation of features. In this study, we show that a constraint to the decoder can also significantly improve its performance because the decoder determines how the latent variables contribute to the reconstruction of input. Inspired by the structural modal analysis method in mechanical engineering, a new modal autoencoder (MAE) is proposed by othogonalising the columns of the readout weight matrix. The new regularization helps to disentangle explanatory factors of variation and forces the MAE to extract fundamental modes in data. The learned representations are functionally independent in the reconstruction of input and perform better in consecutive classification tasks. The results were validated on the MNIST variations and USPS classification benchmark suite. Comparative experiments clearly show that the new algorithm has a surprising advantage. The new MAE introduces a very simple training principle for autoencoders and could be promising for the pre-training of deep neural networks.

التعلم الآلي التعلم الالي

A Complete Discriminative Tensor Representation Learning for Two-Dimensional Correlation Analysis

147 - Lei Gao , , Ling Guan 2021

As an effective tool for two-dimensional data analysis, two-dimensional canonical correlation analysis (2DCCA) is not only capable of preserving the intrinsic structural information of original two-dimensional (2D) data, but also reduces the computat ional complexity effectively. However, due to the unsupervised nature, 2DCCA is incapable of extracting sufficient discriminatory representations, resulting in an unsatisfying performance. In this letter, we propose a complete discriminative tensor representation learning (CDTRL) method based on linear correlation analysis for analyzing 2D signals (e.g. images). This letter shows that the introduction of the complete discriminatory tensor representation strategy provides an effective vehicle for revealing, and extracting the discriminant representations across the 2D data sets, leading to improved results. Experimental results show that the proposed CDTRL outperforms state-of-the-art methods on the evaluated data sets.

التعلم الآلي

Discriminative Feature Representation with Spatio-temporal Cues for Vehicle Re-identification

116 - J. Tu , C. Chen , X. Huang 2020

Vehicle re-identification (re-ID) aims to discover and match the target vehicles from a gallery image set taken by different cameras on a wide range of road networks. It is crucial for lots of applications such as security surveillance and traffic ma nagement. The remarkably similar appearances of distinct vehicles and the significant changes of viewpoints and illumination conditions take grand challenges to vehicle re-ID. Conventional solutions focus on designing global visual appearances without sufficient consideration of vehicles spatiotamporal relationships in different images. In this paper, we propose a novel discriminative feature representation with spatiotemporal clues (DFR-ST) for vehicle re-ID. It is capable of building robust features in the embedding space by involving appearance and spatio-temporal information. Based on this multi-modal information, the proposed DFR-ST constructs an appearance model for a multi-grained visual representation by a two-stream architecture and a spatio-temporal metric to provide complementary information. Experimental results on two public datasets demonstrate DFR-ST outperforms the state-of-the-art methods, which validate the effectiveness of the proposed method.

الرؤية الحاسوبية وتمييز الأنماط