Sentiment Analysis for YouTube Comments in Roman Urdu


Abstract in English

Sentiment analysis is a vast area in the Machine learning domain. A lot of work is done on datasets and their analysis of the English Language. In Pakistan, a huge amount of data is in roman Urdu language, it is scattered all over the social sites including Twitter, YouTube, Facebook and similar applications. In this study the focus domain of dataset gathering is YouTube comments. The Dataset contains the comments of people over different Pakistani dramas and TV shows. The Dataset contains multi-class classification that is grouped The comments into positive, negative and neutral sentiment. In this Study comparative analysis is done for five supervised learning Algorithms including linear regression, SVM, KNN, Multi layer Perceptron and Naive Bayes classifier. Accuracy, recall, precision and F-measure are used for measuring performance. Results show that accuracy of SVM is 64 percent, which is better than the rest of the list.

Download