بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Parameterized Complexity of Categorical Clustering with Size Constraints

219 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Petr Golovach

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Fedor V. Fomin - Petr A. Golovach -

بنى وهياكل البيانات والخوارزميات الرياضيات المتقطعة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In the Categorical Clustering problem, we are given a set of vectors (matrix) A={a_1,ldots,a_n} over Sigma^m, where Sigma is a finite alphabet, and integers k and B. The task is to partition A into k clusters such that the median objective of the clustering in the Hamming norm is at most B. That is, we seek a partition {I_1,ldots,I_k} of {1,ldots,n} and vectors c_1,ldots,c_kinSigma^m such that sum_{i=1}^ksum_{jin I_i}d_h(c_i,a_j)leq B, where d_H(a,b) is the Hamming distance between vectors a and b. Fomin, Golovach, and Panolan [ICALP 2018] proved that the problem is fixed-parameter tractable (for binary case Sigma={0,1}) by giving an algorithm that solves the problem in time 2^{O(Blog B)} (mn)^{O(1)}. We extend this algorithmic result to a popular capacitated clustering model, where in addition the sizes of the clusters should satisfy certain constraints. More precisely, in Capacitated Clustering, in addition, we are given two non-negative integers p and q, and seek a clustering with pleq |I_i|leq q for all iin{1,ldots,k}. Our main theorem is that Capacitated Clustering is solvable in time 2^{O(Blog B)}|Sigma|^B(mn)^{O(1)}. The theorem not only extends the previous algorithmic results to a significantly more general model, it also implies algorithms for several other variants of Categorical Clustering with constraints on cluster sizes.

قيم البحث

92 - Sayan Bandyapadhyay , Fedor V. Fomin , Petr A. Golovach 2021

We develop new algorithmic methods with provable guarantees for feature selection in regard to categorical data clustering. While feature selection is one of the most common approaches to reduce dimensionality in practice, most of the known feature s election methods are heuristics. We study the following mathematical model. We assume that there are some inadvertent (or undesirable) features of the input data that unnecessarily increase the cost of clustering. Consequently, we want to select a subset of the original features from the data such that there is a small-cost clustering on the selected features. More precisely, for given integers $ell$ (the number of irrelevant features) and $k$ (the number of clusters), budget $B$, and a set of $n$ categorical data points (represented by $m$-dimensional vectors whose elements belong to a finite set of values $Sigma$), we want to select $m-ell$ relevant features such that the cost of any optimal $k$-clustering on these features does not exceed $B$. Here the cost of a cluster is the sum of Hamming distances ($ell_0$-distances) between the selected features of the elements of the cluster and its center. The clustering cost is the total sum of the costs of the clusters. We use the framework of parameterized complexity to identify how the complexity of the problem depends on parameters $k$, $B$, and $|Sigma|$. Our main result is an algorithm that solves the Feature Selection problem in time $f(k,B,|Sigma|)cdot m^{g(k,|Sigma|)}cdot n^2$ for some functions $f$ and $g$. In other words, the problem is fixed-parameter tractable parameterized by $B$ when $|Sigma|$ and $k$ are constants. Our algorithm is based on a solution to a more general problem, Constrained Clustering with Outliers. We also complement our algorithmic findings with complexity lower bounds.

بنى وهياكل البيانات والخوارزميات الرياضيات المتقطعة

On the Parameterized Complexity of the Maximum Edge Coloring Problem

735 - Prachi Goyal , Vikram Kamat , Neeldhara Misra 2013

We investigate the parameterized complexity of the following edge coloring problem motivated by the problem of channel assignment in wireless networks. For an integer q>1 and a graph G, the goal is to find a coloring of the edges of G with the maximu m number of colors such that every vertex of the graph sees at most q colors. This problem is NP-hard for q>1, and has been well-studied from the point of view of approximation. Our main focus is the case when q=2, which is already theoretically intricate and practically relevant. We show fixed-parameter tractable algorithms for both the standard and the dual parameter, and for the latter problem, the result is based on a linear vertex kernel.

بنى وهياكل البيانات والخوارزميات الرياضيات المتقطعة

Refined Notions of Parameterized Enumeration Kernels with Applications to Matching Cut Enumeration

86 - Petr A. Golovach , Christian Komusiewicz , Dieter Kratsch 2021

An enumeration kernel as defined by Creignou et al. [Theory Comput. Syst. 2017] for a parameterized enumeration problem consists of an algorithm that transforms each instance into one whose size is bounded by the parameter plus a solution-lifting alg orithm that efficiently enumerates all solutions from the set of the solutions of the kernel. We propose to consider two n

بنى وهياكل البيانات والخوارزميات الرياضيات المتقطعة

Parameterized Complexity of the k-anonymity Problem

530 - Stefano Beretta , Paola Bonizzoni , Gianluca Della Vedova 2009

The problem of publishing personal data without giving up privacy is becoming increasingly important. An interesting formalization that has been recently proposed is the $k$-anonymity. This approach requires that the rows of a table are partitioned i n clusters of size at least $k$ and that all the rows in a cluster become the same tuple, after the suppression of some entries. The natural optimization problem, where the goal is to minimize the number of suppressed entries, is known to be APX-hard even when the records values are over a binary alphabet and $k=3$, and when the records have length at most 8 and $k=4$ . In this paper we study how the complexity of the problem is influenced by different parameters. In this paper we follow this direction of research, first showing that the problem is W[1]-hard when parameterized by the size of the solution (and the value $k$). Then we exhibit a fixed parameter algorithm, when the problem is parameterized by the size of the alphabet and the number of columns. Finally, we investigate the computational (and approximation) complexity of the $k$-anonymity problem, when restricting the instance to records having length bounded by 3 and $k=3$. We show that such a restriction is APX-hard.

بنى وهياكل البيانات والخوارزميات قواعد البيانات الرياضيات المتقطعة

Parameterized Algorithms for Maximum Cut with Connectivity Constraints

104 - Hiroshi Eto , Tesshu Hanaka , Yasuaki Kobayashi 2019

We study two variants of textsc{Maximum Cut}, which we call textsc{Connected Maximum Cut} and textsc{Maximum Minimal Cut}, in this paper. In these problems, given an unweighted graph, the goal is to compute a maximum cut satisfying some connectivity requirements. Both problems are known to be NP-complete even on planar graphs whereas textsc{Maximum Cut} on planar graphs is solvable in polynomial time. We first show that these problems are NP-complete even on planar bipartite graphs and split graphs. Then we give parameterized algorithms using graph parameters such as clique-width, tree-width, and twin-cover number. Finally, we obtain FPT algorithms with respect to the solution size.

بنى وهياكل البيانات والخوارزميات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الجزيرة الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Parameterized Complexity of Categorical Clustering with Size Constraints

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً