أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Dimitris S. Papailiopoulos

On the Worst-Case Approximability of Sparse PCA

78 - Siu On Chan , Dimitris Papailiopoulos , Aviad Rubinstein 2015

It is well known that Sparse PCA (Sparse Principal Component Analysis) is NP-hard to solve exactly on worst-case instances. What is the complexity of solving Sparse PCA approximately? Our contributions include: 1) a simple and efficient algorithm tha t achieves an $n^{-1/3}$-approximation; 2) NP-hardness of approximation to within $(1-varepsilon)$, for some small constant $varepsilon > 0$; 3) SSE-hardness of approximation to within any constant factor; and 4) an $expexpleft(Omegaleft(sqrt{log log n}right)right)$ (quasi-quasi-polynomial) gap for the standard semidefinite program.

التعلم الالي التعقيد الحسابي بنى وهياكل البيانات والخوارزميات

Parallel Correlation Clustering on Big Graphs

83 - Xinghao Pan , Dimitris Papailiopoulos , Samet Oymak 2015

Given a similarity graph between items, correlation clustering (CC) groups similar items together and dissimilar ones apart. One of the most popular CC algorithms is KwikCluster: an algorithm that serially clusters neighborhoods of vertices, and obta ins a 3-approximation ratio. Unfortunately, KwikCluster in practice requires a large number of clustering rounds, a potential bottleneck for large graphs. We present C4 and ClusterWild!, two algorithms for parallel correlation clustering that run in a polylogarithmic number of rounds and achieve nearly linear speedups, provably. C4 uses concurrency control to enforce serializability of a parallel clustering process, and guarantees a 3-approximation ratio. ClusterWild! is a coordination free algorithm that abandons consistency for the benefit of better scaling; this leads to a provably small loss in the 3-approximation ratio. We provide extensive experimental results for both algorithms, where we outperform the state of the art, both in terms of clustering accuracy and running time. We show that our algorithms can cluster billion-edge graphs in under 5 seconds on 32 cores, while achieving a 15x speedup.

النظم الموزعة والتوازية والحوسبة العنقودية بنى وهياكل البيانات والخوارزميات التعلم الالي

Sparse Principal Component of a Rank-deficient Matrix

71 - Megasthenis Asteris , Dimitris S. Papailiopoulos , 2011

We consider the problem of identifying the sparse principal component of a rank-deficient matrix. We introduce auxiliary spherical variables and prove that there exists a set of candidate index-sets (that is, sets of indices to the nonzero elements o f the vector argument) whose size is polynomially bounded, in terms of rank, and contains the optimal index-set, i.e. the index-set of the nonzero elements of the optimal solution. Finally, we develop an algorithm that computes the optimal sparse principal component in polynomial time for any sparsity degree.

نظرية المعلومات التعلم الآلي أنظمة وتحكم

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد