ﻻ يوجد ملخص باللغة العربية
Automatic email categorization is an important application of text classification. We study the automatic reply of email business messages in Brazilian Portuguese. We present a novel corpus containing messages from a real application, and baseline categorization experiments using Naive Bayes and support Vector Machines. We then discuss the effect of lemmatization and the role of part-of-speech tagging filtering on precision and recall. Support Vector Machines classification coupled with nonlemmatized selection of verbs, nouns and adjectives was the best approach, with 87.3% maximum accuracy. Straightforward lemmatization in Portuguese led to the lowest classification results in the group, with 85.3% and 81.7% precision in SVM and Naive Bayes respectively. Thus, while lemmatization reduced precision and recall, part-of-speech filtering improved overall results.
In personal email search, user queries often impose different requirements on different aspects of the retrieved emails. For example, the query my recent flight to the US requires emails to be ranked based on both textual contents and recency of the
Person-job fit is to match candidates and job posts on online recruitment platforms using machine learning algorithms. The effectiveness of matching algorithms heavily depends on the learned representations for the candidates and job posts. In this p
Timely analysis of cyber-security information necessitates automated information extraction from unstructured text. While state-of-the-art extraction methods produce extremely accurate results, they require ample training data, which is generally una
In this paper we propose a new document classification method, bridging discrepancies (so-called semantic gap) between the training set and the application sets of textual data. We demonstrate its superiority over classical text classification approa
Due to the fast-growing volume of text documents and reviews in recent years, current analyzing techniques are not competent enough to meet the users needs. Using feature selection techniques not only support to understand data better but also lead t