No Arabic abstract
Regularization methods allow one to handle a variety of inferential problems where there are more covariates than cases. This allows one to consider a potentially enormous number of covariates for a problem. We exploit the power of these techniques, supersaturating models by augmenting the natural covariates in the problem with an additional indicator for each case in the data set. We attach a penalty term for these case-specific indicators which is designed to produce a desired effect. For regression methods with squared error loss, an $ell_1$ penalty produces a regression which is robust to outliers and high leverage cases; for quantile regression methods, an $ell_2$ penalty decreases the variance of the fit enough to overcome an increase in bias. The paradigm thus allows us to robustify procedures which lack robustness and to increase the efficiency of procedures which are robust. We provide a general framework for the inclusion of case-specific parameters in regularization problems, describing the impact on the effective loss for a variety of regression and classification problems. We outline a computational strategy by which existing software can be modified to solve the augmented regularization problem, providing conditions under which such modification will converge to the optimum solution. We illustrate the benefits of including case-specific parameters in the context of mean regression and quantile regression through analysis of NHANES and linguistic data sets.
Two-stage least squares (TSLS) estimators and variants thereof are widely used to infer the effect of an exposure on an outcome using instrumental variables (IVs). They belong to a wider class of two-stage IV estimators, which are based on fitting a conditional mean model for the exposure, and then using the fitted exposure values along with the covariates as predictors in a linear model for the outcome. We show that standard TSLS estimators enjoy greater robustness to model misspecification than more general two-stage estimators. However, by potentially using a wrong exposure model, e.g. when the exposure is binary, they tend to be inefficient. In view of this, we study double-robust G-estimators instead. These use working models for the exposure, IV and outcome but only require correct specification of either the IV model or the outcome model to guarantee consistent estimation of the exposure effect. As the finite sample performance of the locally efficient G-estimator can be poor, we further develop G-estimation procedures with improved efficiency and robustness properties under misspecification of some or all working models. Simulation studies and a data analysis demonstrate drastic improvements, with remarkably good performance even when one or more working models are misspecified.
Can two separate case-control studies, one about Hepatitis disease and the other about Fibrosis, for example, be combined together? It would be hugely beneficial if two or more separately conducted case-control studies, even for entirely irrelevant purposes, can be merged together with a unified analysis that produces better statistical properties, e.g., more accurate estimation of parameters. In this paper, we show that, when using the popular logistic regression model, the combined/integrative analysis produces a more accurate estimation of the slope parameters than the single case-control study. It is known that, in a single logistic case-control study, the intercept is not identifiable, contrary to prospective studies. In combined case-control studies, however, the intercepts are proved to be identifiable under mild conditions. The resulting maximum likelihood estimates of the intercepts and slopes are proved to be consistent and asymptotically normal, with asymptotic variances achieving the semiparametric efficiency lower bound.
A recent technique of randomized smoothing has shown that the worst-case (adversarial) $ell_2$-robustness can be transformed into the average-case Gaussian-robustness by smoothing a classifier, i.e., by considering the averaged prediction over Gaussian noise. In this paradigm, one should rethink the notion of adversarial robustness in terms of generalization ability of a classifier under noisy observations. We found that the trade-off between accuracy and certified robustness of smoothed classifiers can be greatly controlled by simply regularizing the prediction consistency over noise. This relationship allows us to design a robust training objective without approximating a non-existing smoothed classifier, e.g., via soft smoothing. Our experiments under various deep neural network architectures and datasets show that the certified $ell_2$-robustness can be dramatically improved with the proposed regularization, even achieving better or comparable results to the state-of-the-art approaches with significantly less training costs and hyperparameters.
State-of-the-art classifiers have been shown to be largely vulnerable to adversarial perturbations. One of the most effective strategies to improve robustness is adversarial training. In this paper, we investigate the effect of adversarial training on the geometry of the classification landscape and decision boundaries. We show in particular that adversarial training leads to a significant decrease in the curvature of the loss surface with respect to inputs, leading to a drastically more linear behaviour of the network. Using a locally quadratic approximation, we provide theoretical evidence on the existence of a strong relation between large robustness and small curvature. To further show the importance of reduced curvature for improving the robustness, we propose a new regularizer that directly minimizes curvature of the loss surface, and leads to adversarial robustness that is on par with adversarial training. Besides being a more efficient and principled alternative to adversarial training, the proposed regularizer confirms our claims on the importance of exhibiting quasi-linear behavior in the vicinity of data points in order to achieve robustness.
Adversarial robustness has become a topic of growing interest in machine learning since it was observed that neural networks tend to be brittle. We propose an information-geometric formulation of adversarial defense and introduce FIRE, a new Fisher-Rao regularization for the categorical cross-entropy loss, which is based on the geodesic distance between natural and perturbed input features. Based on the information-geometric properties of the class of softmax distributions, we derive an explicit characterization of the Fisher-Rao Distance (FRD) for the binary and multiclass cases, and draw some interesting properties as well as connections with standard regularization metrics. Furthermore, for a simple linear and Gaussian model, we show that all Pareto-optimal points in the accuracy-robustness region can be reached by FIRE while other state-of-the-art methods fail. Empirically, we evaluate the performance of various classifiers trained with the proposed loss on standard datasets, showing up to 2% of improvements in terms of robustness while reducing the training time by 20% over the best-performing methods.