ترغب بنشر مسار تعليمي؟ اضغط هنا

DNN2LR: Automatic Feature Crossing for Credit Scoring

95   0   0.0 ( 0 )
 نشر من قبل Qiang Liu
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Credit scoring is a major application of machine learning for financial institutions to decide whether to approve or reject a credit loan. For sake of reliability, it is necessary for credit scoring models to be both accurate and globally interpretable. Simple classifiers, e.g., Logistic Regression (LR), are white-box models, but not powerful enough to model complex nonlinear interactions among features. Fortunately, automatic feature crossing is a promising way to find cross features to make simple classifiers to be more accurate without heavy handcrafted feature engineering. However, credit scoring is usually based on different aspects of users, and the data usually contains hundreds of feature fields. This makes existing automatic feature crossing methods not efficient for credit scoring. In this work, we find local piece-wise interpretations in Deep Neural Networks (DNNs) of a specific feature are usually inconsistent in different samples, which is caused by feature interactions in the hidden layers. Accordingly, we can design an automatic feature crossing method to find feature interactions in DNN, and use them as cross features in LR. We give definition of the interpretation inconsistency in DNN, based on which a novel feature crossing method for credit scoring prediction called DNN2LR is proposed. Apparently, the final model, i.e., a LR model empowered with cross features, generated by DNN2LR is a white-box model. Extensive experiments have been conducted on both public and business datasets from real-world credit scoring applications. Experimental shows that, DNN2LR can outperform the DNN model, as well as several feature crossing methods. Moreover, comparing with the state-of-the-art feature crossing methods, i.e., AutoCross, DNN2LR can accelerate the speed for feature crossing by about 10 to 40 times on datasets with large numbers of feature fields.

قيم البحث

اقرأ أيضاً

For sake of reliability, it is necessary for models in real-world applications to be both powerful and globally interpretable. Simple classifiers, e.g., Logistic Regression (LR), are globally interpretable, but not powerful enough to model complex no nlinear interactions among features in tabular data. Meanwhile, Deep Neural Networks (DNNs) have shown great effectiveness for modeling tabular data, but is not globally interpretable. In this work, we find local piece-wise interpretations in DNN of a specific feature are usually inconsistent in different samples, which is caused by feature interactions in the hidden layers. Accordingly, we can design an automatic feature crossing method to find feature interactions in DNN, and use them as cross features in LR. We give definition of the interpretation inconsistency in DNN, based on which a novel feature crossing method called DNN2LR is proposed. Extensive experiments have been conducted on four public datasets and two real-world datasets. The final model, i.e., a LR model empowered with cross features, generated by DNN2LR can outperform the complex DNN model, as well as several state-of-the-art feature crossing methods. The experimental results strongly verify the effectiveness and efficiency of DNN2LR, especially on real-world datasets with large numbers of feature fields.
352 - Weiyu Guo , Zhijiang Yang , Shu Wu 2021
Due to the powerful learning ability on high-rank and non-linear features, deep neural networks (DNNs) are being applied to data mining and machine learning in various fields, and exhibit higher discrimination performance than conventional methods. H owever, the applications based on DNNs are rare in enterprise credit rating tasks because most of DNNs employ the end-to-end learning paradigm, which outputs the high-rank representations of objects and predictive results without any explanations. Thus, users in the financial industry cannot understand how these high-rank representations are generated, what do they mean and what relations exist with the raw inputs. Then users cannot determine whether the predictions provided by DNNs are reliable, and not trust the predictions providing by such black box models. Therefore, in this paper, we propose a novel network to explicitly model the enterprise credit rating problem using DNNs and attention mechanisms. The proposed model realizes explainable enterprise credit ratings. Experimental results obtained on real-world enterprise datasets verify that the proposed approach achieves higher performance than conventional methods, and provides insights into individual rating results and the reliability of model training.
The aim of this project is to develop and test advanced analytical methods to improve the prediction accuracy of Credit Risk Models, preserving at the same time the model interpretability. In particular, the project focuses on applying an explainable machine learning model to bank-related databases. The input data were obtained from open data. Over the total proven models, CatBoost has shown the highest performance. The algorithm implementation produces a GINI of 0.68 after tuning the hyper-parameters. SHAP package is used to provide a global and local interpretation of the model predictions to formulate a human-comprehensive approach to understanding the decision-maker algorithm. The 20 most important features are selected using the Shapley values to present a full human-understandable model that reveals how the attributes of an individual are related to its model prediction.
Credit scoring models, which are among the most potent risk management tools that banks and financial institutes rely on, have been a popular subject for research in the past few decades. Accordingly, many approaches have been developed to address th e challenges in classifying loan applicants and improve and facilitate decision-making. The imbalanced nature of credit scoring datasets, as well as the heterogeneous nature of features in credit scoring datasets, pose difficulties in developing and implementing effective credit scoring models, targeting the generalization power of classification models on unseen data. In this paper, we propose the Bagging Supervised Autoencoder Classifier (BSAC) that mainly leverages the superior performance of the Supervised Autoencoder, which learns low-dimensional embeddings of the input data exclusively with regards to the ultimate classification task of credit scoring, based on the principles of multi-task learning. BSAC also addresses the data imbalance problem by employing a variant of the Bagging process based on the undersampling of the majority class. The obtained results from our experiments on the benchmark and real-life credit scoring datasets illustrate the robustness and effectiveness of the Bagging Supervised Autoencoder Classifier in the classification of loan applicants that can be regarded as a positive development in credit scoring models.
Feature crossing captures interactions among categorical features and is useful to enhance learning from tabular data in real-world businesses. In this paper, we present AutoCross, an automatic feature crossing tool provided by 4Paradigm to its custo mers, ranging from banks, hospitals, to Internet corporations. By performing beam search in a tree-structured space, AutoCross enables efficient generation of high-order cross features, which is not yet visited by existing works. Additionally, we propose successive mini-batch gradient descent and multi-granularity discretization to further improve efficiency and effectiveness, while ensuring simplicity so that no machine learning expertise or tedious hyper-parameter tuning is required. Furthermore, the algorithms are designed to reduce the computational, transmitting, and storage costs involved in distributed computing. Experimental results on both benchmark and real-world business datasets demonstrate the effectiveness and efficiency of AutoCross. It is shown that AutoCross can significantly enhance the performance of both linear and deep models.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا