No Arabic abstract
Dependence strucuture estimation is one of the important problems in machine learning domain and has many applications in different scientific areas. In this paper, a theoretical framework for such estimation based on copula and copula entropy -- the probabilistic theory of representation and measurement of statistical dependence, is proposed. Graphical models are considered as a special case of the copula framework. A method of the framework for estimating maximum spanning copula is proposed. Due to copula, the method is irrelevant to the properties of individual variables, insensitive to outlier and able to deal with non-Gaussianity. Experiments on both simulated data and real dataset demonstrated the effectiveness of the proposed method.
In modern ranking problems, different and disparate representations of the items to be ranked are often available. It is sensible, then, to try to combine these representations to improve ranking. Indeed, learning to rank via combining representations is both principled and practical for learning a ranking function for a particular query. In extremely data-scarce settings, however, the amount of labeled data available for a particular query can lead to a highly variable and ineffective ranking function. One way to mitigate the effect of the small amount of data is to leverage information from semantically similar queries. Indeed, as we demonstrate in simulation settings and real data examples, when semantically similar queries are available it is possible to gainfully use them when ranking with respect to a particular query. We describe and explore this phenomenon in the context of the bias-variance trade off and apply it to the data-scarce settings of a Bing navigational graph and the Drosophila larva connectome.
We present a joint copula-based model for insurance claims and sizes. It uses bivariate copulae to accommodate for the dependence between these quantities. We derive the general distribution of the policy loss without the restrictive assumption of independence. We illustrate that this distribution tends to be skewed and multi-modal, and that an independence assumption can lead to substantial bias in the estimation of the policy loss. Further, we extend our framework to regression models by combining marginal generalized linear models with a copula. We show that this approach leads to a flexible class of models, and that the parameters can be estimated efficiently using maximum-likelihood. We propose a test procedure for the selection of the optimal copula family. The usefulness of our approach is illustrated in a simulation study and in an analysis of car insurance policies.
In this paper we present a novel approach for firm default probability estimation. The methodology is based on multivariate contingent claim analysis and pair copula constructions. For each considered firm, balance sheet data are used to assess the asset value, and to compute its default probability. The asset pricing function is expressed via a pair copula construction, and it is approximated via Monte Carlo simulations. The methodology is illustrated through an application to the analysis of both operative and defaulted firms.
Tail dependence refers to clustering of extreme events. In the context of financial risk management, the clustering of high-severity risks has a devastating effect on the well-being of firms and is thus of pivotal importance in risk analysis.When it comes to quantifying the extent of tail dependence, it is generally agreed that measures of tail dependence must be independent of the marginal distributions of the risks but rather solely copula-dependent. Indeed, all classical measures of tail dependence are such, but they investigate the amount of tail dependence along the main diagonal of copulas, which has often little in common with the concentration of extremes in the copulas domain of definition.In this paper we urge that the classical measures of tail dependence may underestimate the level of tail dependence in copulas. For the Gaussian copula, however, we prove that the classical measures are maximal. The implication of the result is two-fold: On the one hand, it means that in the Gaussian case, the (weak) measures of tail dependence that have been reported and used are of utmost prudence, which must be a reassuring news for practitioners. On the other hand, it further encourages substitution of the Gaussian copula with other copulas that are more tail dependent.
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem: we can observe only positive examples. Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem. However, such methods have two main drawbacks particularly in large-scale applications; (1) the pairwise approach is severely inefficient due to the quadratic computational cost; and (2) even recent model-based samplers (e.g. IRGAN) cannot achieve practical efficiency due to the training of an extra model. In this paper, we propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart while performing similarly to the pairwise counterpart in terms of ranking effectiveness. Our approach estimates the probability densities of positive items for each user within a rich class of distributions, viz. emph{exponential family}. In our formulation, we derive a loss function and the appropriate negative sampling distribution based on maximum likelihood estimation. We also develop a practical technique for risk approximation and a regularisation scheme. We then discuss that our single-model approach is equivalent to an IRGAN variant under a certain condition. Through experiments on real-world datasets, our approach outperforms the pointwise and pairwise counterparts in terms of effectiveness and efficiency.