أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Alfred O. Hero

Multimodal Data Fusion in High-Dimensional Heterogeneous Datasets via Generative Models

100 - Yasin Yilmaz , Mehmet Aktukmak , Alfred O. Hero 2021

The commonly used latent space embedding techniques, such as Principal Component Analysis, Factor Analysis, and manifold learning techniques, are typically used for learning effective representations of homogeneous data. However, they do not readily extend to heterogeneous data that are a combination of numerical and categorical variables, e.g., arising from linked GPS and text data. In this paper, we are interested in learning probabilistic generative models from high-dimensional heterogeneous data in an unsupervised fashion. The learned generative model provides latent unified representations that capture the factors common to the multiple dimensions of the data, and thus enable fusing multimodal data for various machine learning tasks. Following a Bayesian approach, we propose a general framework that combines disparate data types through the natural parameterization of the exponential family of distributions. To scale the model inference to millions of instances with thousands of features, we use the Laplace-Bernstein approximation for posterior computations involving nonlinear link functions. The proposed algorithm is presented in detail for the commonly encountered heterogeneous datasets with real-valued (Gaussian) and categorical (multinomial) features. Experiments on two high-dimensional and heterogeneous datasets (NYC Taxi and MovieLens-10M) demonstrate the scalability and competitive performance of the proposed algorithm on different machine learning tasks such as anomaly detection, data imputation, and recommender systems.

التعلم الآلي التعلم الالي

A unified framework for correlation mining in ultra-high dimension

113 - Alfred O. Hero , Bala Rajaratnam , Yun Wei 2021

An important problem in large scale inference is the identification of variables that have large correlations or partial correlations. Recent work has yielded breakthroughs in the ultra-high dimensional setting when the sample size $n$ is fixed and t he dimension $p rightarrow infty$ ([Hero, Rajaratnam 2011, 2012]). Despite these advances, the correlation screening framework suffers from some serious practical, methodological and theoretical deficiencies. For instance, theoretical safeguards for partial correlation screening requires that the population covariance matrix be block diagonal. This block sparsity assumption is however highly restrictive in numerous practical applications. As a second example, results for correlation and partial correlation screening framework requires the estimation of dependence measures or functionals, which can be highly prohibitive computationally. In this paper, we propose a unifying approach to correlation and partial correlation mining which specifically goes beyond the block diagonal correlation structure, thus yielding a methodology that is suitable for modern applications. By making connections to random geometric graphs, the number of highly correlated or partial correlated variables are shown to have novel compound Poisson finite-sample characterizations, which hold for both the finite $p$ case and when $p rightarrow infty$. The unifying framework also demonstrates an important duality between correlation and partial correlation screening with important theoretical and practical consequences.

نظرية الإحصاء نظرية الإحصاء

Space-Time Adaptive Detection at Low Sample Support

51 - Benjamin D. Robinson , Robert Malinas , Alfred O. Hero III 2020

An important problem in space-time adaptive detection is the estimation of the large p-by-p interference covariance matrix from training signals. When the number of training signals n is greater than 2p, existing estimators are generally considered t o be adequate, as demonstrated by fixed-dimensional asymptotics. But in the low-sample-support regime (n < 2p or even n < p) fixed-dimensional asymptotics are no longer applicable. The remedy undertaken in this paper is to consider the large dimensional limit in which n and p go to infinity together. In this asymptotic regime, a new type of estimator is defined (Definition 2), shown to exist (Theorem 1), and shown to be detection-theoretically ideal (Theorem 2). Further, asymptotic conditional detection and false-alarm rates of filters formed from this type of estimator are characterized (Theorems 3 and 4) and shown to depend only on data that is given, even for non-Gaussian interference statistics. The paper concludes with several Monte Carlo simulations that compare the performance of the estimator in Theorem 1 to the predictions of Theorems 2-4, showing in particular higher detection probability than Steiner and Gerlachs Fast Maximum Likelihood estimator.

معالجة الإشارات

Pattern-Based Analysis of Time Series: Estimation

155 - Elyas Sabeti , Peter X.K. Song , Alfred O. Hero 2020

While Internet of Things (IoT) devices and sensors create continuous streams of information, Big Data infrastructures are deemed to handle the influx of data in real-time. One type of such a continuous stream of information is time series data. Due t o the richness of information in time series and inadequacy of summary statistics to encapsulate structures and patterns in such data, development of new approaches to learn time series is of interest. In this paper, we propose a novel method, called pattern tree, to learn patterns in the times-series using a binary-structured tree. While a pattern tree can be used for many purposes such as lossless compression, prediction and anomaly detection, in this paper we focus on its application in time series estimation and forecasting. In comparison to other methods, our proposed pattern tree method improves the mean squared error of estimation.

المنهجية نظرية المعلومات معالجة الإشارات

Straggler Robust Distributed Matrix Inverse Approximation

183 - Neophytos Charalambides , Mert Pilanci , Alfred O. Hero III 2020

A cumbersome operation in numerical analysis and linear algebra, optimization, machine learning and engineering algorithms; is inverting large full-rank matrices which appears in various processes and applications. This has both numerical stability a nd complexity issues, as well as high expected time to compute. We address the latter issue, by proposing an algorithm which uses a black-box least squares optimization solver as a subroutine, to give an estimate of the inverse (and pseudoinverse) of real nonsingular matrices; by estimating its columns. This also gives it the flexibility to be performed in a distributed manner, thus the estimate can be obtained a lot faster, and can be made robust to textit{stragglers}. Furthermore, we assume a centralized network with no message passing between the computing nodes, and do not require a matrix factorization; e.g. LU, SVD or QR decomposition beforehand.

التحليل العددي نظرية المعلومات التحليل العددي

The Power of Graph Convolutional Networks to Distinguish Random Graph Models: Short Version

218 - Abram Magner , Mayank Baranwal , Alfred O. Hero III 2020

Graph convolutional networks (GCNs) are a widely used method for graph representation learning. We investigate the power of GCNs, as a function of their number of layers, to distinguish between different random graph models on the basis of the embedd ings of their sample graphs. In particular, the graph models that we consider arise from graphons, which are the most general possible parameterizations of infinite exchangeable graph models and which are the central objects of study in the theory of dense graph limits. We exhibit an infinite class of graphons that are well-separated in terms of cut distance and are indistinguishable by a GCN with nonlinear activation functions coming from a certain broad class if its depth is at least logarithmic in the size of the sample graph. These results theoretically match empirical observations of several prior works. Finally, we show a converse result that for pairs of graphons satisfying a degree profile separation property, a very simple GCN architecture suffices for distinguishability. To prove our results, we exploit a connection to random walks on graphs.

التعلم الالي نظرية المعلومات التعلم الآلي

Fundamental Limits of Deep Graph Convolutional Networks

76 - Abram Magner , Mayank Baranwal , Alfred O. Hero III 2019

Graph convolutional networks (GCNs) are a widely used method for graph representation learning. To elucidate the capabilities and limitations of GCNs, we investigate their power, as a function of their number of layers, to distinguish between differe nt random graph models (corresponding to different class-conditional distributions in a classification problem) on the basis of the embeddings of their sample graphs. In particular, the graph models that we consider arise from graphons, which are the most general possible parameterizations of infinite exchangeable graph models and which are the central objects of study in the theory of dense graph limits. We give a precise characterization of the set of pairs of graphons that are indistinguishable by a GCN with nonlinear activation functions coming from a certain broad class if its depth is at least logarithmic in the size of the sample graph. This characterization is in terms of a degree profile closeness property. Outside this class, a very simple GCN architecture suffices for distinguishability. We then exhibit a concrete, infinite class of graphons arising from stochastic block models that are well-separated in terms of cut distance and are indistinguishable by a GCN. These results theoretically match empirical observations of several prior works. To prove our results, we exploit a connection to random walks on graphs. Finally, we give empirical results on synthetic and real graph classification datasets, indicating that indistinguishable graph distributions arise in practice.

التعلم الالي نظرية المعلومات التعلم الآلي

Semi-supervised Learning in Network-Structured Data via Total Variation Minimization

137 - Alexander Jung , Alfred O. Hero III , Alexandru Mara 2019

We propose and analyze a method for semi-supervised learning from partially-labeled network-structured data. Our approach is based on a graph signal recovery interpretation under a clustering hypothesis that labels of data points belonging to the sam e well-connected subset (cluster) are similar valued. This lends naturally to learning the labels by total variation (TV) minimization, which we solve by applying a recently proposed primal-dual method for non-smooth convex optimization. The resulting algorithm allows for a highly scalable implementation using message passing over the underlying empirical graph, which renders the algorithm suitable for big data applications. By applying tools of compressed sensing, we derive a sufficient condition on the underlying network structure such that TV minimization recovers clusters in the empirical graph of the data. In particular, we show that the proposed primal-dual method amounts to maximizing network flows over the empirical graph of the dataset. Moreover, the learning accuracy of the proposed algorithm is linked to the set of network flows between data points having known labels. The effectiveness and scalability of our approach is verified by numerical experiments.

التعلم الآلي التعلم الالي

Scalable Mutual Information Estimation using Dependence Graphs

133 - Morteza Noshad , Yu Zeng , Alfred O. Hero III 2018

The Mutual Information (MI) is an often used measure of dependency between two random variables utilized in information theory, statistics and machine learning. Recently several MI estimators have been proposed that can achieve parametric MSE converg ence rate. However, most of the previously proposed estimators have the high computational complexity of at least $O(N^2)$. We propose a unified method for empirical non-parametric estimation of general MI function between random vectors in $mathbb{R}^d$ based on $N$ i.i.d. samples. The reduced complexity MI estimator, called the ensemble dependency graph estimator (EDGE), combines randomized locality sensitive hashing (LSH), dependency graphs, and ensemble bias-reduction methods. We prove that EDGE achieves optimal computational complexity $O(N)$, and can achieve the optimal parametric MSE rate of $O(1/N)$ if the density is $d$ times differentiable. To the best of our knowledge EDGE is the first non-parametric MI estimator that can achieve parametric MSE rates with linear time complexity. We illustrate the utility of EDGE for the analysis of the information plane (IP) in deep learning. Using EDGE we shed light on a controversy on whether or not the compression property of information bottleneck (IB) in fact holds for ReLu and other rectification functions in deep neural networks (DNN).

نظرية المعلومات نظرية المعلومات التعلم الالي

On Decentralized Estimation with Active Queries

32 - Theodoros Tsiligkaridis , Brian M. Sadler , Alfred O. Hero III 2013

We consider the problem of decentralized 20 questions with noise for multiple players/agents under the minimum entropy criterion in the setting of stochastic search over a parameter space, with application to target localization. We propose decentral ized extensions of the active query-based stochastic search strategy that combines elements from the 20 questions approach and social learning. We prove convergence to correct consensus on the value of the parameter. This framework provides a flexible and tractable mathematical model for decentralized parameter estimation systems based on active querying. We illustrate the effectiveness and robustness of the proposed decentralized collaborative 20 questions algorithm for random network topologies with information sharing.

أنظمة متعددة العملاء نظرية المعلومات أنظمة وتحكم

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد