ﻻ يوجد ملخص باللغة العربية
Authorship identification is a process in which the author of a text is identified. Most known literary texts can easily be attributed to a certain author because they are, for example, signed. Yet sometimes we find unfinished pieces of work or a whole bunch of manuscripts with a wide variety of possible authors. In order to assess the importance of such a manuscript, it is vital to know who wrote it. In this work, we aim to develop a machine learning framework to effectively determine authorship. We formulate the task as a single-label multi-class text categorization problem and propose a supervised machine learning framework incorporating stylometric features. This task is highly interdisciplinary in that it takes advantage of machine learning, information retrieval, and natural language processing. We present an approach and a model which learns the differences in writing style between $50$ different authors and is able to predict the author of a new text with high accuracy. The accuracy is seen to increase significantly after introducing certain linguistic stylometric features along with text features.
Authorship attribution (AA), which is the task of finding the owner of a given text, is an important and widely studied research topic with many applications. Recent works have shown that deep learning methods could achieve significant accuracy impro
The proliferation of fake news and its propagation on social media has become a major concern due to its ability to create devastating impacts. Different machine learning approaches have been suggested to detect fake news. However, most of those focu
We predict restaurant ratings from Yelp reviews based on Yelp Open Dataset. Data distribution is presented, and one balanced training dataset is built. Two vectorizers are experimented for feature engineering. Four machine learning models including N
Answering questions asked from instructional corpora such as E-manuals, recipe books, etc., has been far less studied than open-domain factoid context-based question answering. This can be primarily attributed to the absence of standard benchmark dat
In the last decade, a variety of topic models have been proposed for text engineering. However, except Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA), most of existing topic models are seldom applied or considered