Do you want to publish a course? Click here

A Scalable Feature Selection and Opinion Miner Using Whale Optimization Algorithm

70   0   0.0 ( 0 )
 Added by Amir Javadpour
 Publication date 2020
and research's language is English




Ask ChatGPT about the research

Due to the fast-growing volume of text documents and reviews in recent years, current analyzing techniques are not competent enough to meet the users needs. Using feature selection techniques not only support to understand data better but also lead to higher speed and also accuracy. In this article, the Whale Optimization algorithm is considered and applied to the search for the optimum subset of features. As known, F-measure is a metric based on precision and recall that is very popular in comparing classifiers. For the evaluation and comparison of the experimental results, PART, random tree, random forest, and RBF network classification algorithms have been applied to the different number of features. Experimental results show that the random forest has the best accuracy on 500 features. Keywords: Feature selection, Whale Optimization algorithm, Selecting optimal, Classification algorithm



rate research

Read More

68 - Bing Zeng , Liang Gao , Xinyu Li 2017
Increasing nature-inspired metaheuristic algorithms are applied to solving the real-world optimization problems, as they have some advantages over the classical methods of numerical optimization. This paper has proposed a new nature-inspired metaheuristic called Whale Swarm Algorithm for function optimization, which is inspired by the whales behavior of communicating with each other via ultrasound for hunting. The proposed Whale Swarm Algorithm has been compared with several popular metaheuristic algorithms on comprehensive performance metrics. According to the experimental results, Whale Swarm Algorithm has a quite competitive performance when compared with other algorithms.
Whale Optimization Algorithm (WOA) is a nature-inspired meta-heuristic optimization algorithm, which was proposed by Mirjalili and Lewis in 2016. This algorithm has shown its ability to solve many problems. Comprehensive surveys have been conducted about some other nature-inspired algorithms, such as ABC, PSO, etc.Nonetheless, no survey search work has been conducted on WOA. Therefore, in this paper, a systematic and meta analysis survey of WOA is conducted to help researchers to use it in different areas or hybridize it with other common algorithms. Thus, WOA is presented in depth in terms of algorithmic backgrounds, its characteristics, limitations, modifications, hybridizations, and applications. Next, WOA performances are presented to solve different problems. Then, the statistical results of WOA modifications and hybridizations are established and compared with the most common optimization algorithms and WOA. The surveys results indicate that WOA performs better than other common algorithms in terms of convergence speed and balancing between exploration and exploitation. WOA modifications and hybridizations also perform well compared to WOA. In addition, our investigation paves a way to present a new technique by hybridizing both WOA and BAT algorithms. The BAT algorithm is used for the exploration phase, whereas the WOA algorithm is used for the exploitation phase. Finally, statistical results obtained from WOA-BAT are very competitive and better than WOA in 16 benchmarks functions. WOA-BAT also outperforms well in 13 functions from CEC2005 and 7 functions from CEC2019.
Machine learning techniques lend themselves as promising decision-making and analytic tools in a wide range of applications. Different ML algorithms have various hyper-parameters. In order to tailor an ML model towards a specific application, a large number of hyper-parameters should be tuned. Tuning the hyper-parameters directly affects the performance (accuracy and run-time). However, for large-scale search spaces, efficiently exploring the ample number of combinations of hyper-parameters is computationally challenging. Existing automated hyper-parameter tuning techniques suffer from high time complexity. In this paper, we propose HyP-ABC, an automatic innovative hybrid hyper-parameter optimization algorithm using the modified artificial bee colony approach, to measure the classification accuracy of three ML algorithms, namely random forest, extreme gradient boosting, and support vector machine. Compared to the state-of-the-art techniques, HyP-ABC is more efficient and has a limited number of parameters to be tuned, making it worthwhile for real-world hyper-parameter optimization problems. We further compare our proposed HyP-ABC algorithm with state-of-the-art techniques. In order to ensure the robustness of the proposed method, the algorithm takes a wide range of feasible hyper-parameter values, and is tested using a real-world educational dataset.
Automatic email categorization is an important application of text classification. We study the automatic reply of email business messages in Brazilian Portuguese. We present a novel corpus containing messages from a real application, and baseline categorization experiments using Naive Bayes and support Vector Machines. We then discuss the effect of lemmatization and the role of part-of-speech tagging filtering on precision and recall. Support Vector Machines classification coupled with nonlemmatized selection of verbs, nouns and adjectives was the best approach, with 87.3% maximum accuracy. Straightforward lemmatization in Portuguese led to the lowest classification results in the group, with 85.3% and 81.7% precision in SVM and Naive Bayes respectively. Thus, while lemmatization reduced precision and recall, part-of-speech filtering improved overall results.
Greedy algorithms are widely used for problems in machine learning such as feature selection and set function optimization. Unfortunately, for large datasets, the running time of even greedy algorithms can be quite high. This is because for each greedy step we need to refit a model or calculate a function using the previously selected choices and the new candidate. Two algorithms that are faster approximations to the greedy forward selection were introduced recently ([Mirzasoleiman et al. 2013, 2015]). They achieve better performance by exploiting distributed computation and stochastic evaluation respectively. Both algorithms have provable performance guarantees for submodular functions. In this paper we show that divergent from previously held opinion, submodularity is not required to obtain approximation guarantees for these two algorithms. Specifically, we show that a generalized concept of weak submodularity suffices to give multiplicative approximation guarantees. Our result extends the applicability of these algorithms to a larger class of functions. Furthermore, we show that a bounded submodularity ratio can be used to provide data dependent bounds that can sometimes be tighter also for submodular functions. We empirically validate our work by showing superior performance of fast greedy approximations versus several established baselines on artificial and real datasets.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا