New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Improve K-Means Algorithm

تحسين خوارزميات K-Means

6703 9 215 0 ( 0 )

Download Cite

Added by Aِl-Baath University ورقة بحثية

Publication date 2014

and research's language is العربية

Authors محمد حجوز( باحث )

Created by Shamra Editor

fuzzy system facial expression خوارزمية التقسيم العنقدة Clustering Centroid K-Means المركز

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The algorithm classifies objects to a predefined number of clusters, which is given by the user (assume k clusters). The idea is to choose random cluster centers, one for each cluster. These centers are preferred to be as far as possible from each other. Starting points affect the clustering process and results. Here the Centroid initialization plays an important role in determining the cluster assignment in effective way. Also, the convergence behavior of clustering is based on the initial centroid values assigned. This research focuses on the assignment of cluster centroid selection so as to improve the clustering performance by K-Means clustering algorithm. This research uses Initial Cluster Centers Derived from Data Partitioning along the Data Axis with the Highest Variance to assign for cluster centroid.

Artificial intelligence review:

Upgrade your account to view the content

Research summary

تتناول هذه الورقة البحثية التي أعدها الباحث محمد مصطفى حجّوز، تحسين خوارزمية K-Means المستخدمة في عملية التنقيب في البيانات. مع تزايد حجم البيانات في مختلف المجالات، أصبح من الضروري إيجاد تقنيات جديدة للتعامل مع هذا الكم الهائل من البيانات. وتعتبر خوارزمية K-Means واحدة من أشهر خوارزميات التقسيم المعنقدة التي تهدف إلى تجميع الكائنات المتشابهة في عناقيد بناءً على خصائصها. تعتمد خوارزمية K-Means التقليدية على اختيار مراكز العناقيد بشكل عشوائي، مما يؤثر على فعالية عملية التجميع والنتائج. يركز البحث على تحسين أداء الخوارزمية من خلال تحسين طريقة اختيار مراكز العناقيد الأولية. يتم ذلك باستخدام مراكز العناقيد الأولية المستمدة من تقسيم البيانات على طول محور البيانات وفقًا لأعلى فرق. تتضمن الورقة خطوات مفصلة للخوارزمية التقليدية والمحسنة، بالإضافة إلى أمثلة توضيحية لعمل الخوارزميات. كما تم تقييم الخوارزمية المحسنة باستخدام مجموعة من البيانات المختلفة، وأظهرت النتائج أن الخوارزمية المحسنة تحتاج إلى عدد أقل من التكرارات والوقت مقارنة بالخوارزمية التقليدية. وتخلص الورقة إلى أن الخوارزمية المحسنة تقدم أداءً أفضل من الخوارزمية التقليدية، مع الحفاظ على نفس مستوى التعقيد الحسابي.

Critical review

دراسة نقدية: تعتبر الورقة البحثية خطوة مهمة في تحسين خوارزمية K-Means، ولكن هناك بعض النقاط التي يمكن مناقشتها. أولاً، على الرغم من أن الورقة تقدم تحسينًا واضحًا في أداء الخوارزمية، إلا أن التقييم يعتمد على مجموعة بيانات محددة، مما يثير التساؤل حول فعالية الخوارزمية المحسنة على مجموعات بيانات أخرى متنوعة. ثانيًا، لم تتناول الورقة بشكل كافٍ كيفية التعامل مع البيانات الفئوية أو غير العددية، وهو ما يمكن أن يكون تحديًا في تطبيقات العالم الحقيقي. ثالثًا، كان من الممكن تقديم تحليل أعمق حول تأثير النقاط الشاذة على أداء الخوارزمية المحسنة. أخيرًا، يمكن أن تكون هناك حاجة لمزيد من الدراسات المقارنة مع خوارزميات عنقدة أخرى لتحسين الفهم الشامل لأداء الخوارزمية المحسنة.

Questions related to the research

ما هي المشكلة الرئيسية التي تسعى الورقة إلى حلها؟

تسعى الورقة إلى تحسين أداء خوارزمية K-Means من خلال تحسين طريقة اختيار مراكز العناقيد الأولية لتقليل العشوائية وزيادة دقة التجميع.
كيف يتم اختيار مراكز العناقيد الأولية في الخوارزمية المحسنة؟

يتم اختيار مراكز العناقيد الأولية في الخوارزمية المحسنة باستخدام مراكز العناقيد الأولية المستمدة من تقسيم البيانات على طول محور البيانات وفقًا لأعلى فرق.
ما هي الفوائد الرئيسية للخوارزمية المحسنة مقارنة بالخوارزمية التقليدية؟

الفوائد الرئيسية للخوارزمية المحسنة تشمل تقليل عدد التكرارات المطلوبة والوقت المنقضي، مما يؤدي إلى تحسين الكفاءة والأداء العام للخوارزمية.
هل تناولت الورقة كيفية التعامل مع البيانات الفئوية أو غير العددية؟

لم تتناول الورقة بشكل كافٍ كيفية التعامل مع البيانات الفئوية أو غير العددية، وهو ما يمكن أن يكون تحديًا في تطبيقات العالم الحقيقي.

Keywords

K-Means تحسين الخوارزميات التنقيب في البيانات العنقدة الذكاء الاصطناعي

References used

Dunham, M. H. 2003-Data Mining: Introductory and Advanced Topics. Prentice Hal Bazsalica, 328p

Hand,D. Mannila,H. Smyth,R. 2001- Principles of Data Mining, MIT Press, London, 285p. Algorithms,Indian,221p

Kaufman,L. Rousseeuw,P2010-Finding Groups in Data: an Introduction to Cluster Analysis. John,170p

Ng,R, Han.J-2008-Efficient and Effective Clustering Methods for Spatial Data Mining, Conf, 144p

Shi Yong, Zhang. Ge. 2011-Research on an improved algorithm for cluster analysis, International Conference on Consumer Electronics, Communications and Networks (CECNet), 601p

rate research

A New Algorithm for Data Clustering and Enhancing K-Means Algorithm

3199 - Aِl-Baath University 2016 ورقة بحثية

This paper introduces a new algorithm to solve some problems that data clustering algorithms such as K-Means suffer from. This new algorithm by itself is able to cluster data without the need of other clustering algorithms.

العنقدة Clustering Centroid المركز Data Mining البيانات الكائنات النقاط المنعزلة مربع الخطأ Objects Isolated Points Square Error المزيد..

Assigned Elementary Centroids Thoughtfully in K-Medoids Algorithm

2384 - Aِl-Baath University 2014 ورقة بحثية

With the tremendous development in all areas of scientific, economic, political and other appeared the need to find nontraditional ways in which to deal with all the data patterns (text, video and audio, etc.), which are becoming very large volumes these days. Was necessary to find new ways to develop knowledge and information hidden within this huge amount of data such as query for customers who have habits of purchasing the same or prospects for the sale of a particular commodity in one of the geographical areas and other queries deductive and based on the technology of data mining. The process of exploration in several of the most important methods of clustering method (assembly) Clustering, which are several algorithms. We will focus in this research on the use of a way calculated to create centers of First Instance of the algorithm K-Medoids which is based on the principle of the division of data into clusters each cluster contains a replica database easy to handle, rather than selected as random which in turn leads to the emergence of different results and slow in the implementation of the algorithm.

خوارزمية التقسيم العنقدة المراكز K-Medoids Clustering Centroids

Modifying Mountain Clustering Algorithm and Using It to Enhance the Performance of Fuzzy C-Means Algorithm

1563 - Aِl-Baath University 2017 ورقة بحثية

In this paper, we introduce a modification to fuzzy mountain data clustering algorithm. We were able to make this algorithm working automatically, through finding a way to divide the space, to determine the values of the input parameters, and the stop condition automatically, instead of getting them by the user.

cost function مصفوفة العضوية دالة الكثافة خوارزمية عنقدة ضبابية دالة الكلفة وسطاء الدخل Membership Matrix Mountain Function Fuzzy Clustering Algorithm Input Parameters المزيد..

Memory-efficient Transformers via Top-k Attention

474 - Association for Computation Linguistics 2021 مقالة

Following the success of dot-product attention in Transformers, numerous approximations have been recently proposed to address its quadratic complexity with respect to the input length. While these variants are memory and compute efficient, it is not possible to directly use them with popular pre-trained language models trained using vanilla attention, without an expensive corrective pre-training stage. In this work, we propose a simple yet highly accurate approximation for vanilla attention. We process the queries in chunks, and for each query, compute the top-*k* scores with respect to the keys. Our approach offers several advantages: (a) its memory usage is linear in the input size, similar to linear attention variants, such as Performer and RFA (b) it is a drop-in replacement for vanilla attention that does not require any corrective pre-training, and (c) it can also lead to significant memory savings in the feed-forward layers after casting them into the familiar query-key-value framework. We evaluate the quality of top-*k* approximation for multi-head attention layers on the Long Range Arena Benchmark, and for feed-forward layers of T5 and UnifiedQA on multiple QA datasets. We show our approach leads to accuracy that is nearly-identical to vanilla attention in multiple setups including training from scratch, fine-tuning, and zero-shot inference.

memory-efficient transformers transformers via top-k top-k attention محولات كفاءة الذاكرة المحولات عبر Top-K اهتمام Top-K صناعة حمض الفوسفور المزيد..

Jujeop: Korean Puns for K-pop Stars on Social Media

584 - Association for Computation Linguistics 2021 مقالة

Jujeop is a type of pun and a unique way for fans to express their love for the K-pop stars they follow using Korean. One of the unique characteristics of Jujeop is its use of exaggerated expressions to compliment K-pop stars, which contain or lead t o humor. Based on this characteristic, Jujeop can be separated into four distinct types, with their own lexical collocations: (1) Fragmenting words to create a twist, (2) Homophones and homographs, (3) Repetition, and (4) Nonsense. Thus, the current study first defines the concept of Jujeop in Korean, manually labels 8.6K comments and annotates the comments to one of the four Jujeop types. With the given annotated corpus, this study presents distinctive characteristics of Jujeop comments compared to the other comments by classification task. Moreover, with the clustering approach, we proposed a structural dependency within each Jujeop type. We have made our dataset publicly available for future research of Jujeop expressions.

نماذج هش k-pop stars jujeop نجوم K-Pop خجوب صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Improve K-Means Algorithm

تحسين خوارزميات K-Means

Ask ChatGPT about the research

Read More

suggested questions