No Arabic abstract
Mean profiles are widely used as indicators of the electricity consumption habits of customers. Currently, in Electricite De France (EDF), class load profiles are estimated using point-wise mean function. Unfortunately, it is well known that the mean is highly sensitive to the presence of outliers, such as one or more consumers with unusually high-levels of consumption. In this paper, we propose an alternative to the mean profile: the $L_1$-median profile which is more robust. When dealing with large datasets of functional data (load curves for example), survey sampling approaches are useful for estimating the median profile avoiding storing the whole data. We propose here estimators of the median trajectory using several sampling strategies and estimators. A comparison between them is illustrated by means of a test population. We develop a stratification based on the linearized variable which substantially improves the accuracy of the estimator compared to simple random sampling without replacement. We suggest also an improved estimator that takes into account auxiliary information. Some potential areas for future research are also highlighted.
We provide a MATLAB toolbox, BFDA, that implements a Bayesian hierarchical model to smooth multiple functional data with the assumptions of the same underlying Gaussian process distribution, a Gaussian process prior for the mean function, and an Inverse-Wishart process prior for the covariance function. This model-based approach can borrow strength from all functional data to increase the smoothing accuracy, as well as estimate the mean-covariance functions simultaneously. An option of approximating the Bayesian inference process using cubic B-spline basis functions is integrated in BFDA, which allows for efficiently dealing with high-dimensional functional data. Examples of using BFDA in various scenarios and conducting follow-up functional regression are provided. The advantages of BFDA include: (1) Simultaneously smooths multiple functional data and estimates the mean-covariance functions in a nonparametric way; (2) flexibly deals with sparse and high-dimensional functional data with stationary and nonstationary covariance functions, and without the requirement of common observation grids; (3) provides accurately smoothed functional data for follow-up analysis.
Despite the great promise of machine-learning algorithms to classify and predict astrophysical parameters for the vast numbers of astrophysical sources and transients observed in large-scale surveys, the peculiarities of the training data often manifest as strongly biased predictions on the data of interest. Typically, training sets are derived from historical surveys of brighter, more nearby objects than those from more extensive, deeper surveys (testing data). This sample selection bias can cause catastrophic errors in predictions on the testing data because a) standard assumptions for machine-learned model selection procedures break down and b) dense regions of testing space might be completely devoid of training data. We explore possible remedies to sample selection bias, including importance weighting (IW), co-training (CT), and active learning (AL). We argue that AL---where the data whose inclusion in the training set would most improve predictions on the testing set are queried for manual follow-up---is an effective approach and is appropriate for many astronomical applications. For a variable star classification problem on a well-studied set of stars from Hipparcos and OGLE, AL is the optimal method in terms of error rate on the testing data, beating the off-the-shelf classifier by 3.4% and the other proposed methods by at least 3.0%. To aid with manual labeling of variable stars, we developed a web interface which allows for easy light curve visualization and querying of external databases. Finally, we apply active learning to classify variable stars in the ASAS survey, finding dramatic improvement in our agreement with the ACVS catalog, from 65.5% to 79.5%, and a significant increase in the classifiers average confidence for the testing set, from 14.6% to 42.9%, after a few AL iterations.
In the context of industrial engineering, cold-standby redundancies allocation strategy is usually adopted to improve the reliability of coherent systems. This paper investigates optimal allocation strategies of cold standbys for series and parallel systems comprised of dependent components with left/right tail weakly stochastic arrangement increasing lifetimes. For the case of heterogeneous and independent matched cold standbys, it is proved that better redundancies should be put in the nodes having weaker [better] components for series [parallel] systems. For the case of homogeneous and independent cold standbys, it is shown that more redundancies should be put in standby with weaker [better] components to enhance the reliability of series [parallel] systems. The results developed here generalize and extend those corresponding ones in the literature to the case of series and parallel systems with dependent components. Numerical examples are also presented to provide guidance for the practical use of our theoretical findings.
Despite an increasing reliance on fully-automated algorithmic decision-making in our day-to-day lives, human beings still make highly consequential decisions. As frequently seen in business, healthcare, and public policy, recommendations produced by algorithms are provided to human decision-makers to guide their decisions. While there exists a fast-growing literature evaluating the bias and fairness of such algorithmic recommendations, an overlooked question is whether they help humans make better decisions. We develop a statistical methodology for experimentally evaluating the causal impacts of algorithmic recommendations on human decisions. We also show how to examine whether algorithmic recommendations improve the fairness of human decisions and derive the optimal decision rules under various settings. We apply the proposed methodology to preliminary data from the first-ever randomized controlled trial that evaluates the pretrial Public Safety Assessment (PSA) in the criminal justice system. A goal of the PSA is to help judges decide which arrested individuals should be released. On the basis of the preliminary data available, we find that providing the PSA to the judge has little overall impact on the judges decisions and subsequent arrestee behavior. However, our analysis yields some potentially suggestive evidence that the PSA may help avoid unnecessarily harsh decisions for female arrestees regardless of their risk levels while it encourages the judge to make stricter decisions for male arrestees who are deemed to be risky. In terms of fairness, the PSA appears to increase the gender bias against males while having little effect on any existing racial differences in judges decision. Finally, we find that the PSAs recommendations might be unnecessarily severe unless the cost of a new crime is sufficiently high.
Motivated by the problem of colocalization analysis in fluorescence microscopic imaging, we study in this paper structured detection of correlated regions between two random processes observed on a common domain. We argue that although intuitive, direct use of the maximum log-likelihood statistic suffers from potential bias and substantially reduced power, and introduce a simple size-based normalization to overcome this problem. We show that scanning with the proposed size-corrected likelihood ratio statistics leads to optimal correlation detection over a large collection of structured correlation detection problems.