With the tremendous development in all areas of scientific,
economic, political and other appeared the need to find nontraditional ways in which to deal with all the data patterns (text, video and audio, etc.), which are becoming very large volumes
these days. Was necessary to find new ways to develop knowledge and information hidden within this huge amount of data such as query for customers who have habits of purchasing the same or prospects for the sale of a particular commodity in one of the geographical areas and other queries deductive and based on the technology of data mining. The process of exploration in several of the most important methods of clustering method (assembly) Clustering, which are several algorithms.
We will focus in this research on the use of a way calculated to create centers of First Instance of the algorithm K-Medoids which is based on the principle of the division of data into clusters each
cluster contains a replica database easy to handle, rather than
selected as random which in turn leads to the emergence of different results and slow in the implementation of the algorithm.
The algorithm classifies objects to a predefined number of clusters, which is given by the user (assume k clusters). The idea is to choose random cluster centers, one for each cluster. These centers are preferred to be as far as possible from each ot
her. Starting points affect the clustering process and results. Here the Centroid initialization plays an important role in determining the cluster assignment in effective way. Also, the convergence behavior of clustering is based on the initial centroid values assigned. This research focuses on the assignment of cluster centroid selection so as to improve the clustering performance by K-Means clustering algorithm. This research uses Initial Cluster Centers Derived from Data Partitioning along the Data Axis with the Highest Variance to assign for cluster centroid.
In this paper we introduce a comparison for some of data mining algorithm for traffic accidents analysis.
We start by describing available data for entry by analyzing the structure of statistical reports in Lattakia traffic directorate, and proceed
to data mining stage which enables us to smart study of factors that play roles in traffic accident and find its inter-relations and importance for causing traffic accident.
That comes after building data warehouse upon the database we built to store the data we gathered.
In this research we list a some of models was tested which is a sample of a many cases we checked to have the research results.
The tracking using wireless sensor networks is one of the applications that are
experiencing significant growth. Due to considerations of wireless sensor networks in
terms of limited energy source, researches continue to improve methods of routing
and
transforming information to ensure lower power. Therefore, we have in this research
improved the routing of target location information within WSN by providing a new
algorithm, which takes advantage of the concept of clustering for wireless network sensors,
with the addition of the possibility of interaction between field sensors that belong to
different clusters, where in other cases, they cannot interact with each other in the
traditional case of cluster networks. to get rid of repeating the same information transfer,
we depend on the parameter intensity of the received signal from the target in the sensors,
which will reflect positively on the network age, and give a more accurate indication of the
target site. We have implemented the proposed algorithm and showed the results of using
the simulator OPNET which is one of the best simulators in the field of various types of
networks.
The main goal of data mining process is to extract information and
discover knowledge from huge databases, where the clustering is
one of the most important functionalities which can be done in this
area. There are many of clustering algorithms an
d methods, but
determining or estimating the number of clusters which should be
extracted from a dataset is one of the most important issues most of
these methods encounter it. This research focuses on the problem of
estimating number of clusters in the case of agglomerative
hierarchical clustering. We present an evaluation of three of the
most common methods used in estimating number of clusters.
This paper introduces a new algorithm to solve some problems
that data clustering algorithms such as K-Means suffer from.
This new algorithm by itself is able to cluster data without the
need of other clustering algorithms.
choose the right way to dividing set of data with high dimensions to clusters in specific field and comparison the different subspace clustering algorithms and present the applications and usage
The Virtualization is the main structure, and the most important component from the
others cloud computing components. Due to the features which are produced only from
the virtual environment, such as flexibility, cost, energy saving, and the optim
al usage of
the resources, most business companies and the governments look ahead for deploying
their services and applications on virtual severs instead of the physical ones. That point
made the researchers heading toward performance comparing through different virtual
environments. For reaching the best possible environment which is suitable to be used in
the cloud computing, and to get the high performance computing. We described the effect
evolution of the virtual environment on the high performance computing as a service,
through changing the type of the used virtual environment in the infrastructures. We used
the virtual environment XEN-PV ( XEN-Paravirtualization) as an infrastructures for the
high performance. Lastly, I used the XEN-HVM (XEN HardWare virtual machine ), XEN
is the main hypervisor in Citrix company and the performance has been evaluated by
using DRBD as a shared storage of virtual disk. The results of our researches proved that
both the virtual environment, and the selected clustering charts have a visible, important
role in the performance of the “The high performance computing”.
Educational data mining aims to study the available data in the educational field and extract the hidden knowledge from it in order to benefit from this knowledge in enhancing the education process and making successful decisions that will improve th
e student’s academic performance. This study proposes the use of data mining techniques to improve student performance prediction. Three classification algorithms (Naïve Bayes,J48, Support Vector Machine) were applied to the student performance database, and then a new classifier was designed to combine the results of those individual classifiers using Voting Method. The WEKA tool was used, which supports a lot of data mining algorithms and methods. The results show that the ensemble classifier has the highest accuracy for predicting students' levels compared to other classifiers, as it has achieved a recognition accuracy of 74.8084%. The simple k-means clustering algorithm was useful in grouping similar students into separate groups, thus understanding the characteristics of each group, which helps to lead and direct each group separately.