ترغب بنشر مسار تعليمي؟ اضغط هنا

Effectiveness and Limitations of Statistical Spam Filters

31   0   0.0 ( 0 )
 نشر من قبل M. Tariq Banday
 تاريخ النشر 2009
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In this paper we discuss the techniques involved in the design of the famous statistical spam filters that include Naive Bayes, Term Frequency-Inverse Document Frequency, K-Nearest Neighbor, Support Vector Machine, and Bayes Additive Regression Tree. We compare these techniques with each other in terms of accuracy, recall, precision, etc. Further, we discuss the effectiveness and limitations of statistical filters in filtering out various types of spam from legitimate e-mails.

قيم البحث

اقرأ أيضاً

214 - D. Oser , F. Mazeas , X. Le Roux 2018
Selective on-chip optical filters with high rejection levels are key components for a wide range of advanced photonic circuits. However, maximum achievable rejection in state-of-the-art on-chip devices is seriously limited by phase errors arising fro m fabrication imperfections. Due to coherent interactions, unwanted phase-shifts result in detrimental destructive interferences that distort the filter response, whatever the chosen strategy (resonators, interferometers, Bragg filters, etc.). Here we propose and experimentally demonstrate a radically different approach to overcome this fundamental limitation, based on coherency-broken Bragg filters. We exploit non-coherent interaction among modal-engineered waveguide Bragg gratings separated by single-mode waveguides to yield effective cascading, even in the presence of fabrication errors. This technologically independent approach allows seamless combination of filter stages with moderate performance, providing a dramatic increase of on-chip rejection. Based on this concept, we experimentally demonstrate on-chip non-coherent cascading of Si Bragg filters with a record light rejection exceeding 80 dB in the C-band.
Standard autoregressive language models perform only polynomial-time computation to compute the probability of the next symbol. While this is attractive, it means they cannot model distributions whose next-symbol probability is hard to compute. Indee d, they cannot even model them well enough to solve associated easy decision problems for which an engineer might want to consult a language model. These limitations apply no matter how much computation and data are used to train the model, unless the model is given access to oracle parameters that grow superpolynomially in sequence length. Thus, simply training larger autoregressive language models is not a panacea for NLP. Alternatives include energy-based models (which give up efficient sampling) and latent-variable autoregressive models (which give up efficient scoring of a given string). Both are powerful enough to escape the above limitations.
The high cost of acquiring labels is one of the main challenges in deploying supervised machine learning algorithms. Active learning is a promising approach to control the learning process and address the difficulties of data labeling by selecting la beled training examples from a large pool of unlabeled instances. In this paper, we propose a new data-driven approach to active learning by choosing a small set of labeled data points that are both informative and representative. To this end, we present an efficient geometric technique to select a diverse core-set in a low-dimensional latent space obtained by training a Variational Autoencoder (VAE). Our experiments demonstrate an improvement in accuracy over two related techniques and, more importantly, signify the representation power of generative modeling for developing new active learning methods in high-dimensional data settings.
Using powerful posterior distributions is a popular approach to achieving better variational inference. However, recent works showed that the aggregated posterior may fail to match unit Gaussian prior, thus learning the prior becomes an alternative w ay to improve the lower-bound. In this paper, for the first time in the literature, we prove the necessity and effectiveness of learning the prior when aggregated posterior does not match unit Gaussian prior, analyze why this situation may happen, and propose a hypothesis that learning the prior may improve reconstruction loss, all of which are supported by our extensive experiment results. We show that using learned Real NVP prior and just one latent variable in VAE, we can achieve test NLL comparable to very deep state-of-the-art hierarchical VAE, outperforming many previous works with complex hierarchical VAE architectures.
In this paper an attempt is made to review technological, economical and legal aspects of the spam in detail. The technical details will include different techniques of spam control e.g., filtering techniques, Genetic Algorithm, Memory Based Classifi er, Support Vector Machine Method, etc. The economic aspect includes Shaping/Rate Throttling Approach/Economic Filtering and Pricing/Payment based spam control. Finally, the paper discusses the legal provisions for the control of spam. The scope of the legal options is limited to USA, European Union, New Zealand, Canada, Britain and Australia.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا