أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Baharan Mirzasoleiman

90 - Baharan Mirzasoleiman , Amin Karbasi , Rik Sarkar 2014

Many large-scale machine learning problems--clustering, non-parametric learning, kernel machines, etc.--require selecting a small yet representative subset from a large dataset. Such problems can often be reduced to maximizing a submodular set functi on subject to various constraints. Classical approaches to submodular optimization require centralized access to the full dataset, which is impractical for truly large-scale problems. In this paper, we consider the problem of submodular function maximization in a distributed fashion. We develop a simple, two-stage protocol GreeDi, that is easily implemented using MapReduce style computations. We theoretically analyze our approach, and show that under certain natural conditions, performance close to the centralized approach can be achieved. We begin with monotone submodular maximization subject to a cardinality constraint, and then extend this approach to obtain approximation guarantees for (not necessarily monotone) submodular maximization subject to more general constraints including matroid or knapsack constraints. In our extensive experiments, we demonstrate the effectiveness of our approach on several applications, including sparse Gaussian process inference and exemplar based clustering on tens of millions of examples using Hadoop.

التعلم الآلي الذكاء الاصطناعي النظم الموزعة والتوازية والحوسبة العنقودية

Lazier Than Lazy Greedy

74 - Baharan Mirzasoleiman , Ashwinkumar Badanidiyuru , Amin Karbasi 2014

Is it possible to maximize a monotone submodular function faster than the widely used lazy greedy algorithm (also known as accelerated greedy), both in theory and practice? In this paper, we develop the first linear-time algorithm for maximizing a ge neral monotone submodular function subject to a cardinality constraint. We show that our randomized algorithm, STOCHASTIC-GREEDY, can achieve a $(1-1/e-varepsilon)$ approximation guarantee, in expectation, to the optimum solution in time linear in the size of the data and independent of the cardinality constraint. We empirically demonstrate the effectiveness of our algorithm on submodular functions arising in data summarization, including training large-scale kernel methods, exemplar-based clustering, and sensor placement. We observe that STOCHASTIC-GREEDY practically achieves the same utility value as lazy greedy but runs much faster. More surprisingly, we observe that in many practical scenarios STOCHASTIC-GREEDY does not evaluate the whole fraction of data points even once and still achieves indistinguishable results compared to lazy greedy.

التعلم الآلي بنى وهياكل البيانات والخوارزميات استرجاع المعلومات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد