No Arabic abstract
Estimation of model parameters of computer simulators, also known as calibration, is an important topic in many engineering applications. In this paper, we consider the calibration of computer model parameters with the help of engineering design knowledge. We introduce the concept of sensible (calibration) variables. Sensible variables are model parameters which are sensitive in the engineering modeling, and whose optimal values differ from the engineering design values.We propose an effective calibration method to identify and adjust the sensible variables with limited physical experimental data. The methodology is applied to a composite fuselage simulation problem.
Shape control is critical to ensure the quality of composite fuselage assembly. In current practice, the structures are adjusted to the design shape in terms of the $ell_2$ loss for further assembly without considering the existing dimensional gap between two structures. Such practice has two limitations: (1) the design shape may not be the optimal shape in terms of a pair of incoming fuselages with different incoming dimensions; (2) the maximum gap is the key concern during the fuselage assembly process. This paper proposes an optimal shape control methodology via the $ell_infty$ loss for composite fuselage assembly process by considering the existing dimensional gap between the incoming pair of fuselages. Besides, due to the limitation on the number of available actuators in practice, we face an important problem of finding the best locations for the actuators among many potential locations, which makes the problem a sparse estimation problem. We are the first to solve the optimal shape control in fuselage assembly process using the $ell_infty$ model under the framework of sparse estimation, where we use the $ell_1$ penalty to control the sparsity of the resulting estimator. From statistical point of view, this can be formulated as the $ell_infty$ loss based linear regression, and under some standard assumptions, such as the restricted eigenvalue (RE) conditions, and the light tailed noise, the non-asymptotic estimation error of the $ell_1$ regularized $ell_infty$ linear model is derived to be the order of $O(sigmasqrt{frac{Slog p}{n}})$, which meets the upper-bound in the existing literature. Compared to the current practice, the case study shows that our proposed method significantly reduces the maximum gap between two fuselages after shape adjustments.
In the machine learning domain, active learning is an iterative data selection algorithm for maximizing information acquisition and improving model performance with limited training samples. It is very useful, especially for the industrial applications where training samples are expensive, time-consuming, or difficult to obtain. Existing methods mainly focus on active learning for classification, and a few methods are designed for regression such as linear regression or Gaussian process. Uncertainties from measurement errors and intrinsic input noise inevitably exist in the experimental data, which further affects the modeling performance. The existing active learning methods do not incorporate these uncertainties for Gaussian process. In this paper, we propose two new active learning algorithms for the Gaussian process with uncertainties, which are variance-based weighted active learning algorithm and D-optimal weighted active learning algorithm. Through numerical study, we show that the proposed approach can incorporate the impact from uncertainties, and realize better prediction performance. This approach has been applied to improving the predictive modeling for automatic shape control of composite fuselage.
Lyme disease is an infectious disease that is caused by a bacterium called Borrelia burgdorferi sensu stricto. In the United States, Lyme disease is one of the most common infectious diseases. The major endemic areas of the disease are New England, Mid-Atlantic, East-North Central, South Atlantic, and West North-Central. Virginia is on the front-line of the diseases diffusion from the northeast to the south. One of the research objectives for the infectious disease community is to identify environmental and economic variables that are associated with the emergence of Lyme disease. In this paper, we use a spatial Poisson regression model to link the spatial disease counts and environmental and economic variables, and develop a spatial variable selection procedure to effectively identify important factors by using an adaptive elastic net penalty. The proposed methods can automatically select important covariates, while adjusting for possible spatial correlations of disease counts. The performance of the proposed method is studied and compared with existing methods via a comprehensive simulation study. We apply the developed variable selection methods to the Virginia Lyme disease data and identify important variables that are new to the literature. Supplementary materials for this paper are available online.
Prior to adjustment, accounting conditions between national accounts data sets are frequently violated. Benchmarking is the procedure used by economic agencies to make such data sets consistent. It typically involves adjusting a high frequency time series (e.g. quarterly data) so it becomes consistent with a lower frequency version (e.g. annual data). Various methods have been developed to approach this problem of inconsistency between data sets. This paper introduces a new statistical procedure; namely wavelet benchmarking. Wavelet properties allow high and low frequency processes to be jointly analysed and we show that benchmarking can be formulated and approached succinctly in the wavelet domain. Furthermore the time and frequency localisation properties of wavelets are ideal for handling more complicated benchmarking problems. The versatility of the procedure is demonstrated using simulation studies where we provide evidence showing it substantially outperforms currently used methods. Finally, we apply this novel method of wavelet benchmarking to official Office of National Statistics (ONS) data.
Instrumental variable is an essential tool for addressing unmeasured confounding in observational studies. Two stage predictor substitution (2SPS) estimator and two stage residual inclusion(2SRI) are two commonly used approaches in applying instrumental variables. Recently 2SPS was studied under the additive hazards model in the presence of competing risks of time-to-events data, where linearity was assumed for the relationship between the treatment and the instrument variable. This assumption may not be the most appropriate when we have binary treatments. In this paper, we consider the 2SRI estimator under the additive hazards model for general survival data and in the presence of competing risks, which allows generalized linear models for the relation between the treatment and the instrumental variable. We derive the asymptotic properties including a closed-form asymptotic variance estimate for the 2SRI estimator. We carry out numerical studies in finite samples, and apply our methodology to the linked Surveillance, Epidemiology and End Results (SEER) - Medicare database comparing radical prostatectomy versus conservative treatment in early-stage prostate cancer patients.