ﻻ يوجد ملخص باللغة العربية
Evaluating the inherent difficulty of a given data-driven classification problem is important for establishing absolute benchmarks and evaluating progress in the field. To this end, a natural quantity to consider is the emph{Bayes error}, which measures the optimal classification error theoretically achievable for a given data distribution. While generally an intractable quantity, we show that we can compute the exact Bayes error of generative models learned using normalizing flows. Our technique relies on a fundamental result, which states that the Bayes error is invariant under invertible transformation. Therefore, we can compute the exact Bayes error of the learned flow models by computing it for Gaussian base distributions, which can be done efficiently using Holmes-Diaconis-Ross integration. Moreover, we show that by varying the temperature of the learned flow models, we can generate synthetic datasets that closely resemble standard benchmark datasets, but with almost any desired Bayes error. We use our approach to conduct a thorough investigation of state-of-the-art classification models, and find that in some -- but not all -- cases, these models are capable of obtaining accuracy very near optimal. Finally, we use our method to evaluate the intrinsic hardness of standard benchmark datasets, and classes within those datasets.
This technical note reviews sate-of-the-art algorithms for linear approximation of high-dimensional dynamical systems using low-rank dynamic mode decomposition (DMD). While repeating several parts of our article low-rank dynamic mode decomposition: a
We present a brief history of the field of interpretable machine learning (IML), give an overview of state-of-the-art interpretation methods, and discuss challenges. Research in IML has boomed in recent years. As young as the field is, it has over 20
Scientific knowledge cannot be seen as a set of isolated fields, but as a highly connected network. Understanding how research areas are connected is of paramount importance for adequately allocating funding and human resources (e.g., assembling team
We present a new method for evaluating and training unnormalized density models. Our approach only requires access to the gradient of the unnormalized models log-density. We estimate the Stein discrepancy between the data density $p(x)$ and the model
We propose an empirical Bayes estimator based on Dirichlet process mixture model for estimating the sparse normalized mean difference, which could be directly applied to the high dimensional linear classification. In theory, we build a bridge to conn