An Empirical Study on Feature Discretization

78 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Qiang Liu

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Qiang Liu - Zhaocheng Liu - Haoli Zhang

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

When dealing with continuous numeric features, we usually adopt feature discretization. In this work, to find the best way to conduct feature discretization, we present some theoretical analysis, in which we focus on analyzing correctness and robustness of feature discretization. Then, we propose a novel discretization method called Local Linear Encoding (LLE). Experiments on two numeric datasets show that, LLE can outperform conventional discretization method with much fewer model parameters.

قيم البحث

509 - Deqing Wang , Hui Zhang , Rui Liu 2013

Much work has been done on feature selection. Existing methods are based on document frequency, such as Chi-Square Statistic, Information Gain etc. However, these methods have two shortcomings: one is that they are not reliable for low-frequency term s, and the other is that they only count whether one term occurs in a document and ignore the term frequency. Actually, high-frequency terms within a specific category are often regards as discriminators. This paper focuses on how to construct the feature selection function based on term frequency, and proposes a new approach based on $t$-test, which is used to measure the diversity of the distributions of a term between the specific category and the entire corpus. Extensive comparative experiments on two text corpora using three classifiers show that our new approach is comparable to or or slightly better than the state-of-the-art feature selection methods (i.e., $chi^2$, and IG) in terms of macro-$F_1$ and micro-$F_1$.

التعلم الآلي استرجاع المعلومات التعلم الالي

Neural networks behave as hash encoders: An empirical study

120 - Fengxiang He , Shiye Lei , Jianmin Ji 2021

The input space of a neural network with ReLU-like activations is partitioned into multiple linear regions, each corresponding to a specific activation pattern of the included ReLU-like activations. We demonstrate that this partition exhibits the fol lowing encoding properties across a variety of deep learning models: (1) {it determinism}: almost every linear region contains at most one training example. We can therefore represent almost every training example by a unique activation pattern, which is parameterized by a {it neural code}; and (2) {it categorization}: according to the neural code, simple algorithms, such as $K$-Means, $K$-NN, and logistic regression, can achieve fairly good performance on both training and test data. These encoding properties surprisingly suggest that {it normal neural networks well-trained for classification behave as hash encoders without any extra efforts.} In addition, the encoding properties exhibit variability in different scenarios. {Further experiments demonstrate that {it model size}, {it training time}, {it training sample size}, {it regularization}, and {it label noise} contribute in shaping the encoding properties, while the impacts of the first three are dominant.} We then define an {it activation hash phase chart} to represent the space expanded by {model size}, training time, training sample size, and the encoding properties, which is divided into three canonical regions: {it under-expressive regime}, {it critically-expressive regime}, and {it sufficiently-expressive regime}. The source code package is available at url{https://github.com/LeavesLei/activation-code}.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

An empirical study on large scale text classification with skip-gram embeddings

89 - Georgios Balikas , Massih-Reza Amini 2016

We investigate the integration of word embeddings as classification features in the setting of large scale text classification. Such representations have been used in a plethora of tasks, however their application in classification scenarios with tho usands of classes has not been extensively researched, partially due to hardware limitations. In this work, we examine efficient composition functions to obtain document-level from word-level embeddings and we subsequently investigate their combination with the traditional one-hot-encoding representations. By presenting empirical evidence on large, multi-class, multi-label classification problems, we demonstrate the efficiency and the performance benefits of this combination.

الحساب واللغة استرجاع المعلومات

Differentially Private Representation for NLP: Formal Guarantee and An Empirical Study on Privacy and Fairness

136 - Lingjuan Lyu , Xuanli He , Yitong Li 2020

It has been demonstrated that hidden representation learned by a deep model can encode private information of the input, hence can be exploited to recover such information with reasonable accuracy. To address this issue, we propose a novel approach c alled Differentially Private Neural Representation (DPNR) to preserve the privacy of the extracted representation from text. DPNR utilises Differential Privacy (DP) to provide a formal privacy guarantee. Further, we show that masking words via dropout can further enhance privacy. To maintain utility of the learned representation, we integrate DP-noisy representation into a robust training process to derive a robust target model, which also helps for model fairness over various demographic variables. Experimental results on benchmark datasets under various parameter settings demonstrate that DPNR largely reduces privacy leakage without significantly sacrificing the main task performance.

التعلم الآلي التشفير والأمن التعلم الالي

Overfitting in Bayesian Optimization: an empirical study and early-stopping solution

169 - Anastasia Makarova , Huibin Shen , Valerio Perrone 2021

Tuning machine learning models with Bayesian optimization (BO) is a successful strategy to find good hyperparameters. BO defines an iterative procedure where a cross-validated metric is evaluated on promising hyperparameters. In practice, however, an improvement of the validation metric may not translate in better predictive performance on a test set, especially when tuning models trained on small datasets. In other words, unlike conventional wisdom dictates, BO can overfit. In this paper, we carry out the first systematic investigation of overfitting in BO and demonstrate that this issue is serious, yet often overlooked in practice. We propose a novel criterion to early stop BO, which aims to maintain the solution quality while saving the unnecessary iterations that can lead to overfitting. Experiments on real-world hyperparameter optimization problems show that our approach effectively meets these goals and is more adaptive comparing to baselines.

التعلم الآلي الذكاء الاصطناعي التعلم الالي