أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Ravi Kannan

Computing a Nonnegative Matrix Factorization -- Provably

70 - Sanjeev Arora , Rong Ge , Ravi Kannan 2011

In the Nonnegative Matrix Factorization (NMF) problem we are given an $n times m$ nonnegative matrix $M$ and an integer $r > 0$. Our goal is to express $M$ as $A W$ where $A$ and $W$ are nonnegative matrices of size $n times r$ and $r times m$ respec tively. In some applications, it makes sense to ask instead for the product $AW$ to approximate $M$ -- i.e. (approximately) minimize $ orm{M - AW}_F$ where $ orm{}_F$ denotes the Frobenius norm; we refer to this as Approximate NMF. This problem has a rich history spanning quantum mechanics, probability theory, data analysis, polyhedral combinatorics, communication complexity, demography, chemometrics, etc. In the past decade NMF has become enormously popular in machine learning, where $A$ and $W$ are computed using a variety of local search heuristics. Vavasis proved that this problem is NP-complete. We initiate a study of when this problem is solvable in polynomial time: 1. We give a polynomial-time algorithm for exact and approximate NMF for every constant $r$. Indeed NMF is most interesting in applications precisely when $r$ is small. 2. We complement this with a hardness result, that if exact NMF can be solved in time $(nm)^{o(r)}$, 3-SAT has a sub-exponential time algorithm. This rules out substantial improvements to the above algorithm. 3. We give an algorithm that runs in time polynomial in $n$, $m$ and $r$ under the separablity condition identified by Donoho and Stodden in 2003. The algorithm may be practical since it is simple and noise tolerant (under benign assumptions). Separability is believed to hold in many practical settings. To the best of our knowledge, this last result is the first example of a polynomial-time algorithm that provably works under a non-trivial condition on the input and we believe that this will be an interesting and important direction for future work.

بنى وهياكل البيانات والخوارزميات التعلم الآلي

Discovering Global Patterns in Linguistic Networks through Spectral Analysis: A Case Study of the Consonant Inventories

339 - Animesh Mukherjee , Monojit Choudhury , Ravi Kannan 2009

Recent research has shown that language and the socio-cognitive phenomena associated with it can be aptly modeled and visualized through networks of linguistic entities. However, most of the existing works on linguistic networks focus only on the loc al properties of the networks. This study is an attempt to analyze the structure of languages via a purely structural technique, namely spectral analysis, which is ideally suited for discovering the global correlations in a network. Application of this technique to PhoNet, the co-occurrence network of consonants, not only reveals several natural linguistic principles governing the structure of the consonant inventories, but is also able to quantify their relative importance. We believe that this powerful technique can be successfully applied, in general, to study the structure of natural languages.

الحساب واللغة تحليل البيانات والإحصاءات والاحتمال

Finding Dense Subgraphs in G(n,1/2)

109 - Atish Das Sarma , Amit Deshpande , Ravi Kannan 2008

Finding the largest clique is a notoriously hard problem, even on random graphs. It is known that the clique number of a random graph G(n,1/2) is almost surely either k or k+1, where k = 2log n - 2log(log n) - 1. However, a simple greedy algorithm fi nds a clique of size only (1+o(1))log n, with high probability, and finding larger cliques -- that of size even (1+ epsilon)log n -- in randomized polynomial time has been a long-standing open problem. In this paper, we study the following generalization: given a random graph G(n,1/2), find the largest subgraph with edge density at least (1-delta). We show that a simple modification of the greedy algorithm finds a subset of 2log n vertices whose induced subgraph has edge density at least 0.951, with high probability. To complement this, we show that almost surely there is no subset of 2.784log n vertices whose induced subgraph has edge density 0.951 or more.

بنى وهياكل البيانات والخوارزميات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد