No Arabic abstract
We correct a paper previously submitted to CoRR. That paper claimed that the algorithm there described was provably of linear time complexity in the average case. The alleged proof of that statement contained an error, being based on an invalid assumption, and is invalid. In this paper we present both experimental and analytical evidence that the time complexity is of order $N^2$ in the average case, where $N$ is the total length of the merged sequences.
In this paper, we present several improvements in the parallelization of the in-place merge algorithm, which merges two contiguous sorted arrays into one with an O(T) space complexity (where T is the number of threads). The approach divides the two arrays into as many pairs of partitions as there are threads available; such that each thread can later merge a pair of partitions independently of the others. We extend the existing method by proposing a new algorithm to find the median of two partitions. Additionally, we provide a new strategy to divide the input arrays where we minimize the data movement, but at the cost of making this stage sequential. Finally, we provide the so-called linear shifting algorithm that swaps two partitions in-place with contiguous data access. We emphasize that our approach is straightforward to implement and that it can also be used for external (out of place) merging. The results demonstrate that it provides a significant speedup compared to sequential executions, when the size of the arrays is greater than a thousand elements.
Linear regression in $ell_p$-norm is a canonical optimization problem that arises in several applications, including sparse recovery, semi-supervised learning, and signal processing. Generic convex optimization algorithms for solving $ell_p$-regression are slow in practice. Iteratively Reweighted Least Squares (IRLS) is an easy to implement family of algorithms for solving these problems that has been studied for over 50 years. However, these algorithms often diverge for p > 3, and since the work of Osborne (1985), it has been an open problem whether there is an IRLS algorithm that is guaranteed to converge rapidly for p > 3. We propose p-IRLS, the first IRLS algorithm that provably converges geometrically for any $p in [2,infty).$ Our algorithm is simple to implement and is guaranteed to find a $(1+varepsilon)$-approximate solution in $O(p^{3.5} m^{frac{p-2}{2(p-1)}} log frac{m}{varepsilon}) le O_p(sqrt{m} log frac{m}{varepsilon} )$ iterations. Our experiments demonstrate that it performs even better than our theoretical bounds, beats the standard Matlab/CVX implementation for solving these problems by 10--50x, and is the fastest among available implementations in the high-accuracy regime.
We study the design of local algorithms for massive graphs. A local algorithm is one that finds a solution containing or near a given vertex without looking at the whole graph. We present a local clustering algorithm. Our algorithm finds a good cluster--a subset of vertices whose internal connections are significantly richer than its external connections--near a given vertex. The running time of our algorithm, when it finds a non-empty local cluster, is nearly linear in the size of the cluster it outputs. Our clustering algorithm could be a useful primitive for handling massive graphs, such as social networks and web-graphs. As an application of this clustering algorithm, we present a partitioning algorithm that finds an approximate sparsest cut with nearly optimal balance. Our algorithm takes time nearly linear in the number edges of the graph. Using the partitioning algorithm of this paper, we have designed a nearly-linear time algorithm for constructing spectral sparsifiers of graphs, which we in turn use in a nearly-linear time algorithm for solving linear systems in symmetric, diagonally-dominant matrices. The linear system solver also leads to a nearly linear-time algorithm for approximating the second-smallest eigenvalue and corresponding eigenvector of the Laplacian matrix of a graph. These other results are presented in two companion papers.
The optimization version of the Unique Label Cover problem is at the heart of the Unique Games Conjecture which has played an important role in the proof of several tight inapproximability results. In recent years, this problem has been also studied extensively from the point of view of parameterized complexity. Cygan et al. [FOCS 2012] proved that this problem is fixed-parameter tractable (FPT) and Wahlstrom [SODA 2014] gave an FPT algorithm with an improved parameter dependence. Subsequently, Iwata, Wahlstrom and Yoshida [2014] proved that the edge version of Unique Label Cover can be solved in linear FPT-time. That is, there is an FPT algorithm whose dependence on the input-size is linear. However, such an algorithm for the node version of the problem was left as an open problem. In this paper, we resolve this question by presenting the first linear-time FPT algorithm for Node Unique Label Cover.
In the Directed Feedback Vertex Set (DFVS) problem, the input is a directed graph $D$ on $n$ vertices and $m$ edges, and an integer $k$. The objective is to determine whether there exists a set of at most $k$ vertices intersecting every directed cycle of $D$. Whether or not DFVS admits a fixed parameter tractable (FPT) algorithm was considered the most important open problem in parameterized complexity until Chen, Liu, Lu, OSullivan and Razgon [JACM 2008] answered the question in the affirmative. They gave an algorithm for the problem with running time $O(k!4^kk^4nm)$. Since then, no faster algorithm for the problem has been found. In this paper, we give an algorithm for DFVS with running time $O(k!4^kk^5(n+m))$. Our algorithm is the first algorithm for DFVS with linear dependence on input size. Furthermore, the asymptotic dependence of the running time of our algorithm on the parameter $k$ matches up to a factor $k$ the algorithm of Chen, Liu, Lu, OSullivan and Razgon. On the way to designing our algorithm for DFVS, we give a general methodology to shave off a factor of $n$ from iterative-compression based algorithms for a few other well-studied covering problems in parameterized complexity. We demonstrate the applicability of this technique by speeding up by a factor of $n$, the current best FPT algorithms for Multicut [STOC 2011, SICOMP 2014] and Directed Subset Feedback Vertex Set [ICALP 2012, TALG 2014].