ﻻ يوجد ملخص باللغة العربية
We use over 350,000 Yelp reviews on 5,000 restaurants to perform an ablation study on text preprocessing techniques. We also compare the effectiveness of several machine learning and deep learning models on predicting user sentiment (negative, neutral, or positive). For machine learning models, we find that using binary bag-of-word representation, adding bi-grams, imposing minimum frequency constraints and normalizing texts have positive effects on model performance. For deep learning models, we find that using pre-trained word embeddings and capping maximum length often boost model performance. Finally, using macro F1 score as our comparison metric, we find simpler models such as Logistic Regression and Support Vector Machine to be more effective at predicting sentiments than more complex models such as Gradient Boosting, LSTM and BERT.
In the past few years, the growth of e-commerce and digital marketing in Vietnam has generated a huge volume of opinionated data. Analyzing those data would provide enterprises with insight for better business decisions. In this work, as part of the
We predict restaurant ratings from Yelp reviews based on Yelp Open Dataset. Data distribution is presented, and one balanced training dataset is built. Two vectorizers are experimented for feature engineering. Four machine learning models including N
Online reviews play an integral part for success or failure of businesses. Prior to purchasing services or goods, customers first review the online comments submitted by previous customers. However, it is possible to superficially boost or hinder som
In aspect-based sentiment analysis, extracting aspect terms along with the opinions being expressed from user-generated content is one of the most important subtasks. Previous studies have shown that exploiting connections between aspect and opinion
Sentiment analysis is a highly subjective and challenging task. Its complexity further increases when applied to the Arabic language, mainly because of the large variety of dialects that are unstandardized and widely used in the Web, especially in so