No Arabic abstract
For random field theory based multiple comparison corrections In brain imaging, it is often necessary to compute the distribution of the supremum of a random field. Unfortunately, computing the distribution of the supremum of the random field is not easy and requires satisfying many distributional assumptions that may not be true in real data. Thus, there is a need to come up with a different framework that does not use the traditional statistical hypothesis testing paradigm that requires to compute p-values. With this as a motivation, we can use a different approach called the logistic regression that does not require computing the p-value and still be able to localize the regions of brain network differences. Unlike other discriminant and classification techniques that tried to classify preselected feature vectors, the method here does not require any preselected feature vectors and performs the classification at each edge level.
Coresets are one of the central methods to facilitate the analysis of large data sets. We continue a recent line of research applying the theory of coresets to logistic regression. First, we show a negative result, namely, that no strongly sublinear sized coresets exist for logistic regression. To deal with intractable worst-case instances we introduce a complexity measure $mu(X)$, which quantifies the hardness of compressing a data set for logistic regression. $mu(X)$ has an intuitive statistical interpretation that may be of independent interest. For data sets with bounded $mu(X)$-complexity, we show that a novel sensitivity sampling scheme produces the first provably sublinear $(1pmvarepsilon)$-coreset. We illustrate the performance of our method by comparing to uniform sampling as well as to state of the art methods in the area. The experiments are conducted on real world benchmark data for logistic regression.
The cost of both generalized least squares (GLS) and Gibbs sampling in a crossed random effects model can easily grow faster than $N^{3/2}$ for $N$ observations. Ghosh et al. (2020) develop a backfitting algorithm that reduces the cost to $O(N)$. Here we extend that method to a generalized linear mixed model for logistic regression. We use backfitting within an iteratively reweighted penalized least square algorithm. The specific approach is a version of penalized quasi-likelihood due to Schall (1991). A straightforward version of Schalls algorithm would also cost more than $N^{3/2}$ because it requires the trace of the inverse of a large matrix. We approximate that quantity at cost $O(N)$ and prove that this substitution makes an asymptotically negligible difference. Our backfitting algorithm also collapses the fixed effect with one random effect at a time in a way that is analogous to the collapsed Gibbs sampler of Papaspiliopoulos et al. (2020). We use a symmetric operator that facilitates efficient covariance computation. We illustrate our method on a real dataset from Stitch Fix. By properly accounting for crossed random effects we show that a naive logistic regression could underestimate sampling variances by several hundred fold.
With the rapid development of social media sharing, people often need to manage the growing volume of multimedia data such as large scale video classification and annotation, especially to organize those videos containing human activities. Recently, manifold regularized semi-supervised learning (SSL), which explores the intrinsic data probability distribution and then improves the generalization ability with only a small number of labeled data, has emerged as a promising paradigm for semiautomatic video classification. In addition, human action videos often have multi-modal content and different representations. To tackle the above problems, in this paper we propose multiview Hessian regularized logistic regression (mHLR) for human action recognition. Compared with existing work, the advantages of mHLR lie in three folds: (1) mHLR combines multiple Hessian regularization, each of which obtained from a particular representation of instance, to leverage the exploring of local geometry; (2) mHLR naturally handle multi-view instances with multiple representations; (3) mHLR employs a smooth loss function and then can be effectively optimized. We carefully conduct extensive experiments on the unstructured social activity attribute (USAA) dataset and the experimental results demonstrate the effectiveness of the proposed multiview Hessian regularized logistic regression for human action recognition.
We propose a penalized likelihood method that simultaneously fits the multinomial logistic regression model and combines subsets of the response categories. The penalty is non differentiable when pairs of columns in the optimization variable are equal. This encourages pairwise equality of these columns in the estimator, which corresponds to response category combination. We use an alternating direction method of multipliers algorithm to compute the estimator and we discuss the algorithms convergence. Prediction and model selection are also addressed.
In this paper, we propose improved estimation method for logistic regression based on subsamples taken according the optimal subsampling probabilities developed in Wang et al. 2018 Both asymptotic results and numerical results show that the new estimator has a higher estimation efficiency. We also develop a new algorithm based on Poisson subsampling, which does not require to approximate the optimal subsampling probabilities all at once. This is computationally advantageous when available random-access memory is not enough to hold the full data. Interestingly, asymptotic distributions also show that Poisson subsampling produces a more efficient estimator if the sampling rate, the ratio of the subsample size to the full data sample size, does not converge to zero. We also obtain the unconditional asymptotic distribution for the estimator based on Poisson subsampling. The proposed approach requires to use a pilot estimator to correct biases of un-weighted estimators. We further show that even if the pilot estimator is inconsistent, the resulting estimators are still consistent and asymptotically normal if the model is correctly specified.