أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Marco Mondelli

Decoding Reed-Muller Codes with Successive Factor-Graph Permutations

116 - Nghia Doan , Seyyed Ali Hashemi , Marco Mondelli 2021

This paper presents a novel successive factor-graph permutation (SFP) scheme that significantly improves the error-correction performance of Reed-Muller (RM) codes under successive-cancellation list (SCL) decoding. In particular, we perform maximum-l ikelihood decoding on the symmetry group of RM codes to carefully select a good factor-graph permutation on the fly. We further propose an SFP-aided fast SCL decoding that significantly reduces the decoding latency while preserving the error-correction performance of the code. The simulation results show that for the third and fourth order RM codes of length $256$, the proposed decoder reduces up to $85%$ of the memory consumption, $78%$ of the decoding latency, and more than $99%$ of the computational complexity of the state-of-the-art recursive projection-aggregation decoder at the frame error rate of $10^{-3}$.

نظرية المعلومات نظرية المعلومات

PCA Initialization for Approximate Message Passing in Rotationally Invariant Models

111 - Marco Mondelli , Ramji Venkataramanan 2021

We study the problem of estimating a rank-$1$ signal in the presence of rotationally invariant noise-a class of perturbations more general than Gaussian noise. Principal Component Analysis (PCA) provides a natural estimator, and sharp results on its performance have been obtained in the high-dimensional regime. Recently, an Approximate Message Passing (AMP) algorithm has been proposed as an alternative estimator with the potential to improve the accuracy of PCA. However, the existing analysis of AMP requires an initialization that is both correlated with the signal and independent of the noise, which is often unrealistic in practice. In this work, we combine the two methods, and propose to initialize AMP with PCA. Our main result is a rigorous asymptotic characterization of the performance of this estimator. Both the AMP algorithm and its analysis differ from those previously derived in the Gaussian setting: at every iteration, our AMP algorithm requires a specific term to account for PCA initialization, while in the Gaussian case, PCA initialization affects only the first iteration of AMP. The proof is based on a two-phase artificial AMP that first approximates the PCA estimator and then mimics the true AMP. Our numerical simulations show an excellent agreement between AMP results and theoretical predictions, and suggest an interesting open direction on achieving Bayes-optimal performance.

التعلم الالي نظرية المعلومات التعلم الآلي

On Connectivity of Solutions in Deep Learning: The Role of Over-parameterization and Feature Quality

89 - Quynh Nguyen , Pierre Brechet , Marco Mondelli 2021

It has been empirically observed that, in deep neural networks, the solutions found by stochastic gradient descent from different random initializations can be often connected by a path with low loss. Recent works have shed light on this intriguing p henomenon by assuming either the over-parameterization of the network or the dropout stability of the solutions. In this paper, we reconcile these two views and present a novel condition for ensuring the connectivity of two arbitrary points in parameter space. This condition is provably milder than dropout stability, and it provides a connection between the problem of finding low-loss paths and the memorization capacity of neural nets. This last point brings about a trade-off between the quality of features at each layer and the over-parameterization of the network. As an extreme example of this trade-off, we show that (i) if subsets of features at each layer are linearly separable, then almost no over-parameterization is needed, and (ii) under generic assumptions on the features at each layer, it suffices that the last two hidden layers have $Omega(sqrt{N})$ neurons, $N$ being the number of samples. Finally, we provide experimental evidence demonstrating that the presented condition is satisfied in practical settings even when dropout stability does not hold.

التعلم الآلي التعلم الالي

Parallelism versus Latency in Simplified Successive-Cancellation Decoding of Polar Codes

131 - Seyyed Ali Hashemi , Marco Mondelli , Arman Fazeli 2020

This paper characterizes the latency of the simplified successive-cancellation (SSC) decoding scheme for polar codes under hardware resource constraints. In particular, when the number of processing elements $P$ that can perform SSC decoding operatio ns in parallel is limited, as is the case in practice, the latency of SSC decoding is $Oleft(N^{1-1/mu}+frac{N}{P}log_2log_2frac{N}{P}right)$, where $N$ is the block length of the code and $mu$ is the scaling exponent of the channel. Three direct consequences of this bound are presented. First, in a fully-parallel implementation where $P=frac{N}{2}$, the latency of SSC decoding is $Oleft(N^{1-1/mu}right)$, which is sublinear in the block length. This recovers a result from our earlier work. Second, in a fully-serial implementation where $P=1$, the latency of SSC decoding scales as $Oleft(Nlog_2log_2 Nright)$. The multiplicative constant is also calculated: we show that the latency of SSC decoding when $P=1$ is given by $left(2+o(1)right) Nlog_2log_2 N$. Third, in a semi-parallel implementation, the smallest $P$ that gives the same latency as that of the fully-parallel implementation is $P=N^{1/mu}$. The tightness of our bound on SSC decoding latency and the applicability of the foregoing results is validated through extensive simulations.

نظرية المعلومات نظرية المعلومات

Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks

107 - Quynh Nguyen , Marco Mondelli , Guido Montufar 2020

A recent line of work has analyzed the theoretical properties of deep neural networks via the Neural Tangent Kernel (NTK). In particular, the smallest eigenvalue of the NTK has been related to the memorization capacity, the global convergence of grad ient descent algorithms and the generalization of deep nets. However, existing results either provide bounds in the two-layer setting or assume that the spectrum of the NTK matrices is bounded away from 0 for multi-layer networks. In this paper, we provide tight bounds on the smallest eigenvalue of NTK matrices for deep ReLU nets, both in the limiting case of infinite widths and for finite widths. In the finite-width setting, the network architectures we consider are fairly general: we require the existence of a wide layer with roughly order of $N$ neurons, $N$ being the number of data samples; and the scaling of the remaining layer widths is arbitrary (up to logarithmic factors). To obtain our results, we analyze various quantities of independent interest: we give lower bounds on the smallest singular value of hidden feature matrices, and upper bounds on the Lipschitz constant of input-output feature maps.

التعلم الالي التعلم الآلي

Approximate Message Passing with Spectral Initialization for Generalized Linear Models

270 - Marco Mondelli , Ramji Venkataramanan 2020

We consider the problem of estimating a signal from measurements obtained via a generalized linear model. We focus on estimators based on approximate message passing (AMP), a family of iterative algorithms with many appealing features: the performanc e of AMP in the high-dimensional limit can be succinctly characterized under suitable model assumptions; AMP can also be tailored to the empirical distribution of the signal entries, and for a wide class of estimation problems, AMP is conjectured to be optimal among all polynomial-time algorithms. However, a major issue of AMP is that in many models (such as phase retrieval), it requires an initialization correlated with the ground-truth signal and independent from the measurement matrix. Assuming that such an initialization is available is typically not realistic. In this paper, we solve this problem by proposing an AMP algorithm initialized with a spectral estimator. With such an initialization, the standard AMP analysis fails since the spectral estimator depends in a complicated way on the design matrix. Our main contribution is a rigorous characterization of the performance of AMP with spectral initialization in the high-dimensional limit. The key technical idea is to define and analyze a two-phase artificial AMP algorithm that first produces the spectral estimator, and then closely approximates the iterates of the true AMP. We also provide numerical results that demonstrate the validity of the proposed approach.

التعلم الالي نظرية المعلومات التعلم الآلي

Optimal Combination of Linear and Spectral Estimators for Generalized Linear Models

127 - Marco Mondelli , Christos Thrampoulidis , Ramji Venkataramanan 2020

We study the problem of recovering an unknown signal $boldsymbol x$ given measurements obtained from a generalized linear model with a Gaussian sensing matrix. Two popular solutions are based on a linear estimator $hat{boldsymbol x}^{rm L}$ and a spe ctral estimator $hat{boldsymbol x}^{rm s}$. The former is a data-dependent linear combination of the columns of the measurement matrix, and its analysis is quite simple. The latter is the principal eigenvector of a data-dependent matrix, and a recent line of work has studied its performance. In this paper, we show how to optimally combine $hat{boldsymbol x}^{rm L}$ and $hat{boldsymbol x}^{rm s}$. At the heart of our analysis is the exact characterization of the joint empirical distribution of $(boldsymbol x, hat{boldsymbol x}^{rm L}, hat{boldsymbol x}^{rm s})$ in the high-dimensional limit. This allows us to compute the Bayes-optimal combination of $hat{boldsymbol x}^{rm L}$ and $hat{boldsymbol x}^{rm s}$, given the limiting distribution of the signal $boldsymbol x$. When the distribution of the signal is Gaussian, then the Bayes-optimal combination has the form $thetahat{boldsymbol x}^{rm L}+hat{boldsymbol x}^{rm s}$ and we derive the optimal combination coefficient. In order to establish the limiting distribution of $(boldsymbol x, hat{boldsymbol x}^{rm L}, hat{boldsymbol x}^{rm s})$, we design and analyze an Approximate Message Passing (AMP) algorithm whose iterates give $hat{boldsymbol x}^{rm L}$ and approach $hat{boldsymbol x}^{rm s}$. Numerical simulations demonstrate the improvement of the proposed combination with respect to the two methods considered separately.

التعلم الالي نظرية المعلومات التعلم الآلي

Sublinear Latency for Simplified Successive Cancellation Decoding of Polar Codes

94 - Marco Mondelli , Seyyed Ali Hashemi , John Cioffi 2019

This work analyzes the latency of the simplified successive cancellation (SSC) decoding scheme for polar codes proposed by Alamdar-Yazdi and Kschischang. It is shown that, unlike conventional successive cancellation decoding, where latency is linear in the block length, the latency of SSC decoding is sublinear. More specifically, the latency of SSC decoding is $O(N^{1-1/mu})$, where $N$ is the block length and $mu$ is the scaling exponent of the channel, which captures the speed of convergence of the rate to capacity. Numerical results demonstrate the tightness of the bound and show that most of the latency reduction arises from the parallel decoding of subcodes of rate $0$ or $1$.

نظرية المعلومات نظرية المعلومات

On the Connection Between Learning Two-Layers Neural Networks and Tensor Decomposition

111 - Marco Mondelli , Andrea Montanari 2018

We establish connections between the problem of learning a two-layer neural network and tensor decomposition. We consider a model with feature vectors $boldsymbol x in mathbb R^d$, $r$ hidden units with weights ${boldsymbol w_i}_{1le i le r}$ and out put $yin mathbb R$, i.e., $y=sum_{i=1}^r sigma( boldsymbol w_i^{mathsf T}boldsymbol x)$, with activation functions given by low-degree polynomials. In particular, if $sigma(x) = a_0+a_1x+a_3x^3$, we prove that no polynomial-time learning algorithm can outperform the trivial predictor that assigns to each example the response variable $mathbb E(y)$, when $d^{3/2}ll rll d^2$. Our conclusion holds for a `natural data distribution, namely standard Gaussian feature vectors $boldsymbol x$, and output distributed according to a two-layer neural network with random isotropic weights, and under a certain complexity-theoretic assumption on tensor decomposition. Roughly speaking, we assume that no polynomial-time algorithm can substantially outperform current methods for tensor decomposition based on the sum-of-squares hierarchy. We also prove generalizations of this statement for higher degree polynomial activations, and non-random weight vectors. Remarkably, several existing algorithms for learning two-layer networks with rigorous guarantees are based on tensor decomposition. Our results support the idea that this is indeed the core computational difficulty in learning such networks, under the stated generative model for the data. As a side result, we show that under this model learning the network requires accurate learning of its weights, a property that does not hold in a more general setting.

التعلم الآلي بنى وهياكل البيانات والخوارزميات التعلم الالي

Binary Linear Codes with Optimal Scaling: Polar Codes with Large Kernels

89 - Arman Fazeli , S. Hamed Hassani , Marco Mondelli 2017

We prove that, for the binary erasure channel (BEC), the polar-coding paradigm gives rise to codes that not only approach the Shannon limit but do so under the best possible scaling of their block length as a~function of the gap to capacity. This res ult exhibits the first known family of binary codes that attain both optimal scaling and quasi-linear complexity of encoding and decoding. Our proof is based on the construction and analysis of binary polar codes with large kernels. When communicating reliably at rates within $varepsilon > 0$ of capacity, the code length $n$ often scales as $O(1/varepsilon^{mu})$, where the constant $mu$ is called the scaling exponent. It is known that the optimal scaling exponent is $mu=2$, and it is achieved by random linear codes. The scaling exponent of conventional polar codes (based on the $2times 2$ kernel) on the BEC is $mu=3.63$. This falls far short of the optimal scaling guaranteed by random codes. Our main contribution is a rigorous proof of the following result: for the BEC, there exist $elltimesell$ binary kernels, such that polar codes constructed from these kernels achieve scaling exponent $mu(ell)$ that tends to the optimal value of $2$ as $ell$ grows. We furthermore characterize precisely how large $ell$ needs to be as a function of the gap between $mu(ell)$ and $2$. The resulting binary codes maintain the recursive structure of conventional polar codes, and thereby achieve construction complexity $O(n)$ and encoding/decoding complexity $O(nlog n)$.

نظرية المعلومات نظرية المعلومات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد