This paper provides a general result on controlling local Rademacher complexities, which captures in an elegant form to relate the complexities with constraint on the expected norm to the corresponding ones with constraint on the empirical norm. This result is convenient to apply in real applications and could yield refined local Rademacher complexity bounds for function classes satisfying general entropy conditions. We demonstrate the power of our complexity bounds by applying them to derive effective generalization error bounds.
This paper considers batch Reinforcement Learning (RL) with general value function approximation. Our study investigates the minimal assumptions to reliably estimate/minimize Bellman error, and characterizes the generalization performance by (local) Rademacher complexities of general function classes, which makes initial steps in bridging the gap between statistical learning theory and batch RL. Concretely, we view the Bellman error as a surrogate loss for the optimality gap, and prove the followings: (1) In double sampling regime, the excess risk of Empirical Risk Minimizer (ERM) is bounded by the Rademacher complexity of the function class. (2) In the single sampling regime, sample-efficient risk minimization is not possible without further assumptions, regardless of algorithms. However, with completeness assumptions, the excess risk of FQI and a minimax style algorithm can be again bounded by the Rademacher complexity of the corresponding function classes. (3) Fast statistical rates can be achieved by using tools of local Rademacher complexity. Our analysis covers a wide range of function classes, including finite classes, linear spaces, kernel spaces, sparse linear features, etc.
We study covering numbers and local covering numbers with respect to difference graphs and complete bipartite graphs. In particular we show that in every cover of a Young diagram with $binom{2k}{k}$ steps with generalized rectangles there is a row or a column in the diagram that is used by at least $k+1$ rectangles, and prove that this is best-possible. This answers two questions by Kim, Martin, Masa{v{r}}{i}k, Shull, Smith, Uzzell, and Wang (Europ. J. Comb. 2020), namely: - What is the local complete bipartite cover number of a difference graph? - Is there a sequence of graphs with constant local difference graph cover number and unbounded local complete bipartite cover number? We add to the study of these local covering numbers with a lower bound construction and some examples. Following Kim emph{et al.}, we use the results on local covering numbers to provide lower and upper bounds for the local dimension of partially ordered sets of height~2. We discuss the local dimension of some posets related to Boolean lattices and show that the poset induced by the first two layers of the Boolean lattice has local dimension $(1 + o(1))log_2log_2 n$. We conclude with some remarks on covering numbers for digraphs and Ferrers dimension.
We define covering and separation numbers for functions. We investigate their properties, and show that for some classes of functions there is exact equality of separation and covering. We provide analogues for various geometric inequalities on covering numbers, such as volume bounds, bounds connected with Hadwigers conjecture, and inequalities about M-positions for geometric log-concave functions. In particular, we obtain stro
Fairness in algorithmic decision-making processes is attracting increasing concern. When an algorithm is applied to human-related decision-making an estimator solely optimizing its predictive power can learn biases on the existing data, which motivates us the notion of fairness in machine learning. while several different notions are studied in the literature, little studies are done on how these notions affect the individuals. We demonstrate such a comparison between several policies induced by well-known fairness criteria, including the color-blind (CB), the demographic parity (DP), and the equalized odds (EO). We show that the EO is the only criterion among them that removes group-level disparity. Empirical studies on the social welfare and disparity of these policies are conducted.
The developments of Rademacher complexity and PAC-Bayesian theory have been largely independent. One exception is the PAC-Bayes theorem of Kakade, Sridharan, and Tewari (2008), which is established via Rademacher complexity theory by viewing Gibbs classifiers as linear operators. The goal of this paper is to extend this bridge between Rademacher complexity and state-of-the-art PAC-Bayesian theory. We first demonstrate that one can match the fast rate of Catonis PAC-Bayes bounds (Catoni, 2007) using shifted Rademacher processes (Wegkamp, 2003; Lecu{e} and Mitchell, 2012; Zhivotovskiy and Hanneke, 2018). We then derive a new fast-rate PAC-Bayes bound in terms of the flatness of the empirical risk surface on which the posterior concentrates. Our analysis establishes a new framework for deriving fast-rate PAC-Bayes bounds and yields new insights on PAC-Bayesian theory.