Hybrid Model For Word Prediction Using Naive Bayes and Latent Information

378 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Guilherme Wachs-Lopes

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Henrique X. Goulart - Mauro D. L. Tosi - Daniel Soares Gonc{c}alves

الحساب واللغة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Historically, the Natural Language Processing area has been given too much attention by many researchers. One of the main motivation beyond this interest is related to the word prediction problem, which states that given a set words in a sentence, one can recommend the next word. In literature, this problem is solved by methods based on syntactic or semantic analysis. Solely, each of these analysis cannot achieve practical results for end-user applications. For instance, the Latent Semantic Analysis can handle semantic features of text, but cannot suggest words considering syntactical rules. On the other hand, there are models that treat both methods together and achieve state-of-the-art results, e.g. Deep Learning. These models can demand high computational effort, which can make the model infeasible for certain types of applications. With the advance of the technology and mathematical models, it is possible to develop faster systems with more accuracy. This work proposes a hybrid word suggestion model, based on Naive Bayes and Latent Semantic Analysis, considering neighbouring words around unfilled gaps. Results show that this model could achieve 44.2% of accuracy in the MSR Sentence Completion Challenge.

قيم البحث

128 - S M Kamruzzaman , Chowdhury Mofizur Rahman 2010

As the amount of online text increases, the demand for text categorization to aid the analysis and management of text is increasing. Text is cheap, but information, in the form of knowing what classes a text belongs to, is expensive. Automatic catego rization of text can provide this information at low cost, but the classifiers themselves must be built with expensive human effort, or trained from texts which have themselves been manually classified. Text categorization using Association Rule and Naive Bayes Classifier is proposed here. Instead of using words word relation i.e association rules from these words is used to derive feature set from pre-classified text documents. Naive Bayes Classifier is then used on derived features for final categorization.

استرجاع المعلومات قواعد البيانات

Naive Feature Selection: Sparsity in Naive Bayes

159 - Armin Askari , Alexandre dAspremont , Laurent El Ghaoui 2019

Due to its linear complexity, naive Bayes classification remains an attractive supervised learning method, especially in very large-scale settings. We propose a sparse version of naive Bayes, which can be used for feature selection. This leads to a c ombinatorial maximum-likelihood problem, for which we provide an exact solution in the case of binary data, or a bound in the multinomial case. We prove that our bound becomes tight as the marginal contribution of additional features decreases. Both binary and multinomial sparse models are solvable in time almost linear in problem size, representing a very small extra relative cost compared to the classical naive Bayes. Numerical experiments on text data show that the naive Bayes feature selection method is as statistically effective as state-of-the-art feature selection methods such as recursive feature elimination, $l_1$-penalized logistic regression and LASSO, while being orders of magnitude faster. For a large data set, having more than with $1.6$ million training points and about $12$ million features, and with a non-optimized CPU implementation, our sparse naive Bayes model can be trained in less than 15 seconds.

التعلم الآلي التعلم الالي

Hierarchical Latent Word Clustering

59 - Halid Ziya Yerebakan , Fitsum Reda , Yiqiang Zhan 2016

This paper presents a new Bayesian non-parametric model by extending the usage of Hierarchical Dirichlet Allocation to extract tree structured word clusters from text data. The inference algorithm of the model collects words in a cluster if they shar e similar distribution over documents. In our experiments, we observed meaningful hierarchical structures on NIPS corpus and radiology reports collected from public repositories.

الحساب واللغة

Jointly Learning Word Embeddings and Latent Topics

139 - Bei Shi , Wai Lam , Shoaib Jameel 2017

Word embedding models such as Skip-gram learn a vector-space representation for each word, based on the local word collocation patterns that are observed in a text corpus. Latent topic models, on the other hand, take a more global view, looking at th e word distributions across the corpus to assign a topic to each word occurrence. These two paradigms are complementary in how they represent the meaning of word occurrences. While some previous works have already looked at using word embeddings for improving the quality of latent topics, and conversely, at using latent topics for improving word embeddings, such two-step methods cannot capture the mutual interaction between the two paradigms. In this paper, we propose STE, a framework which can learn word embeddings and latent topics in a unified manner. STE naturally obtains topic-specific word embeddings, and thus addresses the issue of polysemy. At the same time, it also learns the term distributions of the topics, and the topic distributions of the documents. Our experimental results demonstrate that the STE model can indeed generate useful topic-specific word embeddings and coherent latent topics in an effective and efficient way.

الحساب واللغة استرجاع المعلومات التعلم الآلي

A Simple Approach to Building Ensembles of Naive Bayesian Classifiers for Word Sense Disambiguation

63 - Ted Pedersen 2000

This paper presents a corpus-based approach to word sense disambiguation that builds an ensemble of Naive Bayesian classifiers, each of which is based on lexical features that represent co--occurring words in varying sized windows of context. Despite the simplicity of this approach, empirical results disambiguating the widely studied nouns line and interest show that such an ensemble achieves accuracy rivaling the best previously published results.

الحساب واللغة

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الوادي الدولية الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Hybrid Model For Word Prediction Using Naive Bayes and Latent Information

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً