Do you want to publish a course? Click here

Improve K-Means Algorithm

تحسين خوارزميات K-Means

6462   9   215   0 ( 0 )
 Publication date 2014
and research's language is العربية
 Created by Shamra Editor




Ask ChatGPT about the research

The algorithm classifies objects to a predefined number of clusters, which is given by the user (assume k clusters). The idea is to choose random cluster centers, one for each cluster. These centers are preferred to be as far as possible from each other. Starting points affect the clustering process and results. Here the Centroid initialization plays an important role in determining the cluster assignment in effective way. Also, the convergence behavior of clustering is based on the initial centroid values assigned. This research focuses on the assignment of cluster centroid selection so as to improve the clustering performance by K-Means clustering algorithm. This research uses Initial Cluster Centers Derived from Data Partitioning along the Data Axis with the Highest Variance to assign for cluster centroid.

References used
Dunham, M. H. 2003-Data Mining: Introductory and Advanced Topics. Prentice Hal Bazsalica, 328p
Hand,D. Mannila,H. Smyth,R. 2001- Principles of Data Mining, MIT Press, London, 285p. Algorithms,Indian,221p
Kaufman,L. Rousseeuw,P2010-Finding Groups in Data: an Introduction to Cluster Analysis. John,170p
Ng,R, Han.J-2008-Efficient and Effective Clustering Methods for Spatial Data Mining, Conf, 144p
Shi Yong, Zhang. Ge. 2011-Research on an improved algorithm for cluster analysis, International Conference on Consumer Electronics, Communications and Networks (CECNet), 601p
rate research

Read More

This paper introduces a new algorithm to solve some problems that data clustering algorithms such as K-Means suffer from. This new algorithm by itself is able to cluster data without the need of other clustering algorithms.
With the tremendous development in all areas of scientific, economic, political and other appeared the need to find nontraditional ways in which to deal with all the data patterns (text, video and audio, etc.), which are becoming very large volumes these days. Was necessary to find new ways to develop knowledge and information hidden within this huge amount of data such as query for customers who have habits of purchasing the same or prospects for the sale of a particular commodity in one of the geographical areas and other queries deductive and based on the technology of data mining. The process of exploration in several of the most important methods of clustering method (assembly) Clustering, which are several algorithms. We will focus in this research on the use of a way calculated to create centers of First Instance of the algorithm K-Medoids which is based on the principle of the division of data into clusters each cluster contains a replica database easy to handle, rather than selected as random which in turn leads to the emergence of different results and slow in the implementation of the algorithm.
In this paper, we introduce a modification to fuzzy mountain data clustering algorithm. We were able to make this algorithm working automatically, through finding a way to divide the space, to determine the values of the input parameters, and the stop condition automatically, instead of getting them by the user.
Following the success of dot-product attention in Transformers, numerous approximations have been recently proposed to address its quadratic complexity with respect to the input length. While these variants are memory and compute efficient, it is not possible to directly use them with popular pre-trained language models trained using vanilla attention, without an expensive corrective pre-training stage. In this work, we propose a simple yet highly accurate approximation for vanilla attention. We process the queries in chunks, and for each query, compute the top-*k* scores with respect to the keys. Our approach offers several advantages: (a) its memory usage is linear in the input size, similar to linear attention variants, such as Performer and RFA (b) it is a drop-in replacement for vanilla attention that does not require any corrective pre-training, and (c) it can also lead to significant memory savings in the feed-forward layers after casting them into the familiar query-key-value framework. We evaluate the quality of top-*k* approximation for multi-head attention layers on the Long Range Arena Benchmark, and for feed-forward layers of T5 and UnifiedQA on multiple QA datasets. We show our approach leads to accuracy that is nearly-identical to vanilla attention in multiple setups including training from scratch, fine-tuning, and zero-shot inference.
Jujeop is a type of pun and a unique way for fans to express their love for the K-pop stars they follow using Korean. One of the unique characteristics of Jujeop is its use of exaggerated expressions to compliment K-pop stars, which contain or lead t o humor. Based on this characteristic, Jujeop can be separated into four distinct types, with their own lexical collocations: (1) Fragmenting words to create a twist, (2) Homophones and homographs, (3) Repetition, and (4) Nonsense. Thus, the current study first defines the concept of Jujeop in Korean, manually labels 8.6K comments and annotates the comments to one of the four Jujeop types. With the given annotated corpus, this study presents distinctive characteristics of Jujeop comments compared to the other comments by classification task. Moreover, with the clustering approach, we proposed a structural dependency within each Jujeop type. We have made our dataset publicly available for future research of Jujeop expressions.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا