ﻻ يوجد ملخص باللغة العربية
Sequence classification algorithms, such as SVM, require a definition of distance (similarity) measure between two sequences. A commonly used notion of similarity is the number of matches between $k$-mers ($k$-length subsequences) in the two sequences. Extending this definition, by considering two $k$-mers to match if their distance is at most $m$, yields better classification performance. This, however, makes the problem computationally much more complex. Known algorithms to compute this similarity have computational complexity that render them applicable only for small values of $k$ and $m$. In this work, we develop novel techniques to efficiently and accurately estimate the pairwise similarity score, which enables us to use much larger values of $k$ and $m$, and get higher predictive accuracy. This opens up a broad avenue of applying this classification approach to audio, images, and text sequences. Our algorithm achieves excellent approximation performance with theoretical guarantees. In the process we solve an open combinatorial problem, which was posed as a major hindrance to the scalability of existing solutions. We give analytical bounds on quality and runtime of our algorithm and report its empirical performance on real world biological and music sequences datasets.
Sequence alignment supports numerous tasks in bioinformatics, natural language processing, pattern recognition, social sciences, and others fields. While the alignment of two sequences may be performed swiftly in many applications, the simultaneous a
The restless bandit problem is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit problem in decision theory. In its ultimate generality, the restless bandit problem is known to be PSPACE-Hard to approximate
We give new approximation algorithms for the submodular joint replenishment problem and the inventory routing problem, using an iterative rounding approach. In both problems, we are given a set of $N$ items and a discrete time horizon of $T$ days in
In the relay placement problem the input is a set of sensors and a number $r ge 1$, the communication range of a relay. In the one-tier version of the problem the objective is to place a minimum number of relays so that between every pair of sensors
We present an $(e^{O(p)} frac{log ell}{loglogell})$-approximation algorithm for socially fair clustering with the $ell_p$-objective. In this problem, we are given a set of points in a metric space. Each point belongs to one (or several) of $ell$ grou