No Arabic abstract
In this work we build a stack of machine learning models aimed at composing a state-of-the-art credit rating and default prediction system, obtaining excellent out-of-sample performances. Our approach is an excursion through the most recent ML / AI concepts, starting from natural language processes (NLP) applied to economic sectors (textual) descriptions using embedding and autoencoders (AE), going through the classification of defaultable firms on the base of a wide range of economic features using gradient boosting machines (GBM) and calibrating their probabilities paying due attention to the treatment of unbalanced samples. Finally we assign credit ratings through genetic algorithms (differential evolution, DE). Model interpretability is achieved by implementing recent techniques such as SHAP and LIME, which explain predictions locally in features space.
One of the key elements in the banking industry rely on the appropriate selection of customers. In order to manage credit risk, banks dedicate special efforts in order to classify customers according to their risk. The usual decision making process consists in gathering personal and financial information about the borrower. Processing this information can be time consuming, and presents some difficulties due to the heterogeneous structure of data. We offer in this paper an alternative method that is able to classify customers profiles from numerical and nominal attributes. The key feature of our method, called LVQ+PSO, is the finding of a reduced set of classifying rules. This is possible, due to the combination of a competitive neural network with an optimization technique. These rules constitute a predictive model for credit risk approval. The reduced quantity of rules makes this method not only useful for credit officers aiming to make quick decisions about granting a credit, but also could act as borrowers self selection. Our method was applied to an actual database of a credit consumer financial institution in Ecuador. We obtain very satisfactory results. Future research lines are exposed.
In this paper, we propose a methodology based on piece-wise homogeneous Markov chain for credit ratings and a multivariate model of the credit spreads to evaluate the financial risk in European Union (EU). Two main aspects are considered: how the financial risk is distributed among the European countries and how large is the value of the total risk. The first aspect is evaluated by means of the expected value of a dynamic entropy measure. The second one is solved by computing the evolution of the total credit spread over time. Moreover, the covariance between countries total spread allows understand any contagions in EU. The methodology is applied to real data of 24 countries for the three major agencies: Moodys, Standard and Poors, and Fitch. Obtained results suggest that both the financial risk inequality and the value of the total risk increase over time at a different rate depending on the rating agency and that the dependence structure is characterized by a strong correlation between most of European countries.
The granting process of all credit institutions rejects applicants who seem risky regarding the repayment of their debt. A credit score is calculated and associated with a cut-off value beneath which an applicant is rejected. Developing a new score implies having a learning dataset in which the response variable good/bad borrower is known, so that rejects are de facto excluded from the learning process. We first introduce the context and some useful notations. Then we formalize if this particular sampling has consequences on the scores relevance. Finally, we elaborate on methods that use not-financed clients characteristics and conclude that none of these methods are satisfactory in practice using data from Credit Agricole Consumer Finance. ----- Un syst`eme doctroi de credit peut refuser des demandes de pr^et jugees trop risquees. Au sein de ce syst`eme, le score de credit fournit une valeur mesurant un risque de defaut, valeur qui est comparee `a un seuil dacceptabilite. Ce score est construit exclusivement sur des donnees de clients finances, contenant en particulier linformation `bon ou mauvais payeur, alors quil est par la suite applique `a lensemble des demandes. Un tel score est-il statistiquement pertinent ? Dans cette note, nous precisons et formalisons cette question et etudions leffet de labsence des non-finances sur les scores elabores. Nous presentons ensuite des methodes pour reintegrer les non-finances et concluons sur leur inefficacite en pratique, `a partir de donnees issues de Credit Agricole Consumer Finance.
Recently, incomplete-market techniques have been used to develop a model applicable to credit default swaps (CDSs) with results obtained that are quite different from those obtained using the market-standard model. This article makes use of the new incomplete-market model to further study CDS hedging and extends the model so that it is capable treating single-name CDS portfolios. Also, a hedge called the vanilla hedge is described, and with it, analytic results are obtained explaining the striking features of the plot of no-arbitrage bounds versus CDS maturity for illiquid CDSs. The valuation process that follows from the incomplete-market model is an integrated modelling and risk management procedure, that first uses the model to find the arbitrage-free range of fair prices, and then requires risk management professionals for both the buyer and the seller to find, as a basis for negotiation, prices that both respect the range of fair prices determined by the model, and also benefit their firms. Finally, in a section on numerical results, the striking behavior of the no-arbitrage bounds as a function of CDS maturity is illustrated, and several examples describe the reduction in risk by the hedging of single-name CDS portfolios.
This work introduces a number of algebraic topology approaches, such as multicomponent persistent homology, multi-level persistent homology and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. Multicomponent persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for chemical and biological problems. Extensive numerical experiments involving more than 4,000 protein-ligand complexes from the PDBBind database and near 100,000 ligands and decoys in the DUD database are performed to test respectively the scoring power and the virtual screening power of the proposed topological approaches. It is demonstrated that the present approaches outperform the modern machine learning based methods in protein-ligand binding affinity predictions and ligand-decoy discrimination.