Comparative Study between R and Rapidminer as Data mining Tools


Abstract in English

The ability of data mining to provide predictive information derived from huge databases became an effective tool in the hands of companies and individuals، allowing them to focus on areas that are important to them from the massive data generated by the march of their daily lives. Along with the increasing importance of this science there was a rapidly increasing in the tools that produced to implement the theory concepts as fast as possible. So it will be hard to take a decision on which of these tools is the best to perform the desired task. This study provides a comparison between the two most commonly used data mining tools according to opinion polls، namely: Rapidminer and R programming language in order to help researchers and developers to choose the best suited tool for them between the two. Adopted the comparison on seven criteria: platform، algorithms، input/output formats، visualization، user’s evaluation، infrastructure and potential development، and performance by applying a set of classification algorithms on a number of data sets and using two techniques to split data set: cross validation and hold-out to make sure of the results. The Results show that R supports the largest number of algorithms، input/output formats، and visualization. While Rapidminer superiority in terms of ease of use and support for a greater number of platforms. In terms of performance the accuracy of classification models that were built using the R packages were higher. That was not true in some cases imposed by the nature of the data because we did not added any pre-processing stage. Finally the preference option in any tool is depending on the extent of the user experience and purpose that the tool is used for

References used

KABACOFF R.2011-R in Action Data Analysis and Graphics with R. Manning Publications، 472 p
HAN J. KAMBER M. and PEI J.2011-Data Mining: Concepts and Techniques. Morgan Kaufmann، Third edition، San Francisco، 744 p
WITTEN I. H. FRANK E. and HALL M. A. 2011-Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann، Third edition، San Francisco، 664p
LIU H. and ZHAO Z. A.2012- Spectral Feature Selection for Data Mining. Chapman & Hall/CRCPress، Virginia Beach، VA، 219p
LIU H. and MODTODA H.2008-Computational Methods of Feature Selection. Chapman & Hall، BocaRaton، FL، 440p

Download