Do you want to publish a course? Click here

Bipartite Correlation Clustering -- Maximizing Agreements

80   0   0.0 ( 0 )
 Added by Megasthenis Asteris
 Publication date 2016
and research's language is English




Ask ChatGPT about the research

In Bipartite Correlation Clustering (BCC) we are given a complete bipartite graph $G$ with `+ and `- edges, and we seek a vertex clustering that maximizes the number of agreements: the number of all `+ edges within clusters plus all `- edges cut across clusters. BCC is known to be NP-hard. We present a novel approximation algorithm for $k$-BCC, a variant of BCC with an upper bound $k$ on the number of clusters. Our algorithm outputs a $k$-clustering that provably achieves a number of agreements within a multiplicative ${(1-delta)}$-factor from the optimal, for any desired accuracy $delta$. It relies on solving a combinatorially constrained bilinear maximization on the bi-adjacency matrix of $G$. It runs in time exponential in $k$ and $delta^{-1}$, but linear in the size of the input. Further, we show that, in the (unconstrained) BCC setting, an ${(1-delta)}$-approximation can be achieved by $O(delta^{-1})$ clusters regardless of the size of the graph. In turn, our $k$-BCC algorithm implies an Efficient PTAS for the BCC objective of maximizing agreements.



rate research

Read More

Recently, Hierarchical Clustering (HC) has been considered through the lens of optimization. In particular, two maximization objectives have been defined. Moseley and Wang defined the emph{Revenue} objective to handle similarity information given by a weighted graph on the data points (w.l.o.g., $[0,1]$ weights), while Cohen-Addad et al. defined the emph{Dissimilarity} objective to handle dissimilarity information. In this paper, we prove structural lemmas for both objectives allowing us to convert any HC tree to a tree with constant number of internal nodes while incurring an arbitrarily small loss in each objective. Although the best-known approximations are 0.585 and 0.667 respectively, using our lemmas we obtain approximations arbitrarily close to 1, if not all weights are small (i.e., there exist constants $epsilon, delta$ such that the fraction of weights smaller than $delta$, is at most $1 - epsilon$); such instances encompass many metric-based similarity instances, thereby improving upon prior work. Finally, we introduce Hierarchical Correlation Clustering (HCC) to handle instances that contain similarity and dissimilarity information simultaneously. For HCC, we provide an approximation of 0.4767 and for complementary similarity/dissimilarity weights (analogous to $+/-$ correlation clustering), we again present nearly-optimal approximations.
Clustering is a fundamental tool for analyzing large data sets. A rich body of work has been devoted to designing data-stream algorithms for the relevant optimization problems such as $k$-center, $k$-median, and $k$-means. Such algorithms need to be both time and and space efficient. In this paper, we address the problem of correlation clustering in the dynamic data stream model. The stream consists of updates to the edge weights of a graph on $n$ nodes and the goal is to find a node-partition such that the end-points of negative-weight edges are typically in different clusters whereas the end-points of positive-weight edges are typically in the same cluster. We present polynomial-time, $O(ncdot mbox{polylog}~n)$-space approximation algorithms for natural problems that arise. We first develop data structures based on linear sketches that allow the quality of a given node-partition to be measured. We then combine these data structures with convex programming and sampling techniques to solve the relevant approximation problem. Unfortunately, the standard LP and SDP formulations are not obviously solvable in $O(ncdot mbox{polylog}~n)$-space. Our work presents space-efficient algorithms for the convex programming required, as well as approaches to reduce the adaptivity of the sampling.
Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining. In correlation clustering, one receives as input a signed graph and the goal is to partition it to minimize the number of disagreements. In this work we propose a massively parallel computation (MPC) algorithm for this problem that is considerably faster than prior work. In particular, our algorithm uses machines with memory sublinear in the number of nodes in the graph and returns a constant approximation while running only for a constant number of rounds. To the best of our knowledge, our algorithm is the first that can provably approximate a clustering problem on graphs using only a constant number of MPC rounds in the sublinear memory regime. We complement our analysis with an experimental analysis of our techniques.
We investigate online scheduling with commitment for parallel identical machines. Our objective is to maximize the total processing time of accepted jobs. As soon as a job has been submitted, the commitment constraint forces us to decide immediately whether we accept or reject the job. Upon acceptance of a job, we must complete it before its deadline $d$ that satisfies $d geq (1+epsilon)cdot p + r$, with $p$ and $r$ being the processing time and the submission time of the job, respectively while $epsilon>0$ is the slack of the system. Since the hard case typically arises for near-tight deadlines, we consider $varepsilonleq 1$. We use competitive analysis to evaluate our algorithms. Our first main contribution is a deterministic preemptive online algorithm with an almost tight competitive ratio on any number of machines. For a single machine, the competitive factor matches the optimal bound $frac{1+epsilon}{epsilon}$ of the greedy acceptance policy. Then the competitive ratio improves with an increasing number of machines and approaches $(1+epsilon)cdotln frac{1+epsilon}{epsilon}$ as the number of machines converges to infinity. This is an exponential improvement over the greedy acceptance policy for small $epsilon$. In the non-preemptive case, we present a deterministic algorithm on $m$ machines with a competitive ratio of $1+mcdot left(frac{1+epsilon}{epsilon}right)^{frac{1}{m}}$. This matches the optimal bound of $2+frac{1}{epsilon}$ of the greedy acceptance policy for a single machine while it again guarantees an exponential improvement over the greedy acceptance policy for small $epsilon$ and large $m$. In addition, we determine an almost tight lower bound that approaches $mcdot left(frac{1}{epsilon}right)^{frac{1}{m}}$ for large $m$ and small $epsilon$.
We study the problem of matching agents who arrive at a marketplace over time and leave after d time periods. Agents can only be matched while they are present in the marketplace. Each pair of agents can yield a different match value, and the planners goal is to maximize the total value over a finite time horizon. We study matching algorithms that perform well over any sequence of arrivals when there is no a priori information about the match values or arrival times. Our main contribution is a 1/4-competitive algorithm. The algorithm randomly selects a subset of agents who will wait until right before their departure to get matched, and maintains a maximum-weight matching with respect to the other agents. The primal-dual analysis of the algorithm hinges on a careful comparison between the initial dual value associated with an agent when it first arrives, and the final value after d time steps. It is also shown that no algorithm is 1/2-competitive. We extend the model to the case in which departure times are drawn i.i.d from a distribution with non-decreasing hazard rate, and establish a 1/8-competitive algorithm in this setting. Finally we show on real-world data that a modified version of our algorithm performs well in practice.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا