ترغب بنشر مسار تعليمي؟ اضغط هنا

Mixtures of high dimensional Gaussian distributions have been studied extensively in statistics and learning theory. While the total variation distance appears naturally in the sample complexity of distribution learning, it is analytically difficult to obtain tight lower bounds for mixtures. Exploiting a connection between total variation distance and the characteristic function of the mixture, we provide fairly tight functional approximations. This enables us to derive new lower bounds on the total variation distance between pairs of two-component Gaussian mixtures that have a shared covariance matrix.
We study the problem of optimizing a non-convex loss function (with saddle points) in a distributed framework in the presence of Byzantine machines. We consider a standard distributed setting with one central machine (parameter server) communicating with many worker machines. Our proposed algorithm is a variant of the celebrated cubic-regularized Newton method of Nesterov and Polyak cite{nest}, which avoids saddle points efficiently and converges to local minima. Furthermore, our algorithm resists the presence of Byzantine machines, which may create emph{fake local minima} near the saddle points of the loss function, also known as saddle-point attack. We robustify the cubic-regularized Newton algorithm such that it avoids the saddle points and the fake local minimas efficiently. Furthermore, being a second order algorithm, the iteration complexity is much lower than its first order counterparts, and thus our algorithm communicates little with the parameter server. We obtain theoretical guarantees for our proposed scheme under several settings including approximate (sub-sampled) gradients and Hessians. Moreover, we validate our theoretical findings with experiments using standard datasets and several types of Byzantine attacks.
In this paper, we present distributed generalized clustering algorithms that can handle large scale data across multiple machines in spite of straggling or unreliable machines. We propose a novel data assignment scheme that enables us to obtain globa l information about the entire data even when some machines fail to respond with the results of the assigned local computations. The assignment scheme leads to distributed algorithms with good approximation guarantees for a variety of clustering and dimensionality reduction problems.
We propose a new mechanism to accurately answer a user-provided set of linear counting queries under local differential privacy (LDP). Given a set of linear counting queries (the workload) our mechanism automatically adapts to provide accuracy on the workload queries. We define a parametric class of mechanisms that produce unbiased estimates of the workload, and formulate a constrained optimization problem to select a mechanism from this class that minimizes expected total squared error. We solve this optimization problem numerically using projected gradient descent and provide an efficient implementation that scales to large workloads. We demonstrate the effectiveness of our optimization-based approach in a wide variety of settings, showing that it outperforms many competitors, even outperforming existing mechanisms on the workloads for which they were intended.
328 - Arya Mazumdar 2018
Motivated by applications in distributed storage, the notion of a locally recoverable code (LRC) was introduced a few years back. In an LRC, any coordinate of a codeword is recoverable by accessing only a small number of other coordinates. While diff erent properties of LRCs have been well-studied, their performance on channels with random erasures or errors has been mostly unexplored. In this note, we analyze the performance of LRCs over such stochastic channels. In particular, for input-symmetric discrete memoryless channels, we give a tight characterization of the gap to Shannon capacity when LRCs are used over the channel.
Index coding, a source coding problem over broadcast channels, has been a subject of both theoretical and practical interest since its introduction (by Birk and Kol, 1998). In short, the problem can be defined as follows: there is an input $textbf{x} triangleq (textbf{x}_1, dots, textbf{x}_n)$, a set of $n$ clients who each desire a single symbol $textbf{x}_i$ of the input, and a broadcaster whose goal is to send as few messages as possible to all clients so that each one can recover its desired symbol. Additionally, each client has some predetermined side information, corresponding to certain symbols of the input $textbf{x}$, which we represent as the side information graph $mathcal{G}$. The graph $mathcal{G}$ has a vertex $v_i$ for each client and a directed edge $(v_i, v_j)$ indicating that client $i$ knows the $j$th symbol of the input. Given a fixed side information graph $mathcal{G}$, we are interested in determining or approximating the broadcast rate of index coding on the graph, i.e. the fewest number of messages the broadcaster can transmit so that every client gets their desired information. Using index coding schemes based on linear programs (LPs), we take a two-pronged approach to approximating the broadcast rate. First, extending earlier work on planar graphs, we focus on approximating the broadcast rate for special graph families such as graphs with small chromatic number and disk graphs. In certain cases, we are able to show that simple LP-based schemes give constant-factor approximations of the broadcast rate, which seem extremely difficult to obtain in the general case. Second, we provide several LP-based schemes for the general case which are not constant-factor approximations, but which strictly improve on the prior best-known schemes.
Recently Ermon et al. (2013) pioneered a way to practically compute approximations to large scale counting or discrete integration problems by using random hashes. The hashes are used to reduce the counting problem into many separate discrete optimiz ation problems. The optimization problems then can be solved by an NP-oracle such as commercial SAT solvers or integer linear programming (ILP) solvers. In particular, Ermon et al. showed that if the domain of integration is ${0,1}^n$ then it is possible to obtain a solution within a factor of $16$ of the optimal (a 16-approximation) by this technique. In many crucial counting tasks, such as computation of partition function of ferromagnetic Potts model, the domain of integration is naturally ${0,1,dots, q-1}^n, q>2$, the hypergrid. The straightforward extension of Ermon et al.s method allows a $q^2$-approximation for this problem. For large values of $q$, this is undesirable. In this paper, we show an improved technique to obtain an approximation factor of $4+O(1/q^2)$ to this problem. We are able to achieve this by using an idea of optimization over multiple bins of the hash functions, that can be easily implemented by inequality constraints, or even in unconstrained way. Also the burden on the NP-oracle is not increased by our method (an ILP solver can still be used). We provide experimental simulation results to support the theoretical guarantees of our algorithms.
This paper considers the problem of implementing large-scale gradient descent algorithms in a distributed computing setting in the presence of {em straggling} processors. To mitigate the effect of the stragglers, it has been previously proposed to en code the data with an erasure-correcting code and decode at the master server at the end of the computation. We, instead, propose to encode the second-moment of the data with a low density parity-check (LDPC) code. The iterative decoding algorithms for LDPC codes have very low computational overhead and the number of decoding iterations can be made to automatically adjust with the number of stragglers in the system. We show that for a random model for stragglers, the proposed moment encoding based gradient descent method can be viewed as the stochastic gradient descent method. This allows us to obtain convergence guarantees for the proposed solution. Furthermore, the proposed moment encoding based method is shown to outperform the existing schemes in a real distributed computing setup.
Rectified linear units, or ReLUs, have become the preferred activation function for artificial neural networks. In this paper we consider two basic learning problems assuming that the underlying data follow a generative model based on a ReLU-network -- a neural network with ReLU activations. As a primarily theoretical study, we limit ourselves to a single-layer network. The first problem we study corresponds to dictionary-learning in the presence of nonlinearity (modeled by the ReLU functions). Given a set of observation vectors $mathbf{y}^i in mathbb{R}^d, i =1, 2, dots , n$, we aim to recover $dtimes k$ matrix $A$ and the latent vectors ${mathbf{c}^i} subset mathbb{R}^k$ under the model $mathbf{y}^i = mathrm{ReLU}(Amathbf{c}^i +mathbf{b})$, where $mathbf{b}in mathbb{R}^d$ is a random bias. We show that it is possible to recover the column space of $A$ within an error of $O(d)$ (in Frobenius norm) under certain conditions on the probability distribution of $mathbf{b}$. The second problem we consider is that of robust recovery of the signal in the presence of outliers, i.e., large but sparse noise. In this setting we are interested in recovering the latent vector $mathbf{c}$ from its noisy nonlinear sketches of the form $mathbf{v} = mathrm{ReLU}(Amathbf{c}) + mathbf{e}+mathbf{w}$, where $mathbf{e} in mathbb{R}^d$ denotes the outliers with sparsity $s$ and $mathbf{w} in mathbb{R}^d$ denote the dense but small noise. This line of work has recently been studied (Soltanolkotabi, 2017) without the presence of outliers. For this problem, we show that a generalized LASSO algorithm is able to recover the signal $mathbf{c} in mathbb{R}^k$ within an $ell_2$ error of $O(sqrt{frac{(k+s)log d}{d}})$ when $A$ is a random Gaussian matrix.
Motivated by applications in distributed storage, the storage capacity of a graph was recently defined to be the maximum amount of information that can be stored across the vertices of a graph such that the information at any vertex can be recovered from the information stored at the neighboring vertices. Computing the storage capacity is a fundamental problem in network coding and is related, or equivalent, to some well-studied problems such as index coding with side information and generalized guessing games. In this paper, we consider storage capacity as a natural information-theoretic analogue of the minimum vertex cover of a graph. Indeed, while it was known that storage capacity is upper bounded by minimum vertex cover, we show that by treating it as such we can get a 3/2 approximation for planar graphs, and a 4/3 approximation for triangle-free planar graphs. Since the storage capacity is intimately related to the index coding rate, we get a 2 approximation of index coding rate for planar graphs and 3/2 approximation for triangle-free planar graphs. We also show a polynomial time approximation scheme for the index coding rate when the alphabet size is constant. We then develop a general method of gadget covering to upper bound the storage capacity in terms of the average of a set of vertex covers. This method is intuitive and leads to the exact characterization of storage capacity for various families of graphs. As an illustrative example, we use this approach to derive the exact storage capacity of cycles-with-chords, a family of graphs related to outerplanar graphs. Finally, we generalize the storage capacity notion to include recovery from partial node failures in distributed storage. We show tight upper and lower bounds on this partial recovery capacity that scales nicely with the fraction of failures in a vertex.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا