No Arabic abstract
We consider the dynamic assortment optimization problem under the multinomial logit model (MNL) with unknown utility parameters. The main question investigated in this paper is model mis-specification under the $varepsilon$-contamination model, which is a fundamental model in robust statistics and machine learning. In particular, throughout a selling horizon of length $T$, we assume that customers make purchases according to a well specified underlying multinomial logit choice model in a ($1-varepsilon$)-fraction of the time periods, and make arbitrary purchasing decisions instead in the remaining $varepsilon$-fraction of the time periods. In this model, we develop a new robust online assortment optimization policy via an active elimination strategy. We establish both upper and lower bounds on the regret, and show that our policy is optimal up to logarithmic factor in T when the assortment capacity is constant. Furthermore, we develop a fully adaptive policy that does not require any prior knowledge of the contamination parameter $varepsilon$. Our simulation study shows that our policy outperforms the existing policies based on upper confidence bounds (UCB) and Thompson sampling.
We consider assortment optimization over a continuous spectrum of products represented by the unit interval, where the sellers problem consists of determining the optimal subset of products to offer to potential customers. To describe the relation between assortment and customer choice, we propose a probabilistic choice model that forms the continuous counterpart of the widely studied discrete multinomial logit model. We consider the sellers problem under incomplete information, propose a stochastic-approximation type of policy, and show that its regret -- its performance loss compared to the optimal policy -- is only logarithmic in the time horizon. We complement this result by showing a matching lower bound on the regret of any policy, implying that our policy is asymptotically optimal. We then show that adding a capacity constraint significantly changes the structure of the problem: we construct a policy and show that its regret after $T$ time periods is bounded above by a constant times $T^{2/3}$ (up to a logarithmic term); in addition, we show that the regret of any policy is bounded from below by a positive constant times $T^{2/3}$, so that also in the capacitated case we obtain asymptotic optimality. Numerical illustrations show that our policies outperform or are on par with alternatives.
We consider the robust filtering problem for a state-space model with outliers in correlated measurements. We propose a new robust filtering framework to further improve the robustness of conventional robust filters. Specifically, the measurement fitting error is processed separately during the reweighting procedure, which differs from existing solutions where a jointly processed scheme is involved. Simulation results reveal that, under the same setup, the proposed method outperforms the existing robust filter when the outlier-contaminated measurements are correlated, while it has the same performance as the existing one in the presence of uncorrelated measurements since these two types of robust filters are equivalent under such a circumstance.
Many machine learning tasks involve subpopulation shift where the testing data distribution is a subpopulation of the training distribution. For such settings, a line of recent work has proposed the use of a variant of empirical risk minimization(ERM) known as distributionally robust optimization (DRO). In this work, we apply DRO to real, large-scale tasks with subpopulation shift, and observe that DRO performs relatively poorly, and moreover has severe instability. We identify one direct cause of this phenomenon: sensitivity of DRO to outliers in the datasets. To resolve this issue, we propose the framework of DORO, for Distributional and Outlier Robust Optimization. At the core of this approach is a refined risk function which prevents DRO from overfitting to potential outliers. We instantiate DORO for the Cressie-Read family of Renyi divergence, and delve into two specific instances of this family: CVaR and $chi^2$-DRO. We theoretically prove the effectiveness of the proposed method, and empirically show that DORO improves the performance and stability of DRO with experiments on large modern datasets, thereby positively addressing the open question raised by Hashimoto et al., 2018.
In practical data analysis under noisy environment, it is common to first use robust methods to identify outliers, and then to conduct further analysis after removing the outliers. In this paper, we consider statistical inference of the model estimated after outliers are removed, which can be interpreted as a selective inference (SI) problem. To use conditional SI framework, it is necessary to characterize the events of how the robust method identifies outliers. Unfortunately, the existing methods cannot be directly used here because they are applicable to the case where the selection events can be represented by linear/quadratic constraints. In this paper, we propose a conditional SI method for popular robust regressions by using homotopy method. We show that the proposed conditional SI method is applicable to a wide class of robust regression and outlier detection methods and has good empirical performance on both synthetic data and real data experiments.
Recent studies show a close connection between neural networks (NN) and kernel methods. However, most of these analyses (e.g., NTK) focus on the influence of (infinite) width instead of the depth of NN models. There remains a gap between theory and practical network designs that benefit from the depth. This paper first proposes a novel kernel family named Neural Optimization Kernel (NOK). Our kernel is defined as the inner product between two $T$-step updated functionals in RKHS w.r.t. a regularized optimization problem. Theoretically, we proved the monotonic descent property of our update rule for both convex and non-convex problems, and a $O(1/T)$ convergence rate of our updates for convex problems. Moreover, we propose a data-dependent structured approximation of our NOK, which builds the connection between training deep NNs and kernel methods associated with NOK. The resultant computational graph is a ResNet-type finite width NN. Our structured approximation preserved the monotonic descent property and $O(1/T)$ convergence rate. Namely, a $T$-layer NN performs $T$-step monotonic descent updates. Notably, we show our $T$-layered structured NN with ReLU maintains a $O(1/T)$ convergence rate w.r.t. a convex regularized problem, which explains the success of ReLU on training deep NN from a NN architecture optimization perspective. For the unsupervised learning and the shared parameter case, we show the equivalence of training structured NN with GD and performing functional gradient descent in RKHS associated with a fixed (data-dependent) NOK at an infinity-width regime. For finite NOKs, we prove generalization bounds. Remarkably, we show that overparameterized deep NN (NOK) can increase the expressive power to reduce empirical risk and reduce the generalization bound at the same time. Extensive experiments verify the robustness of our structured NOK blocks.