No Arabic abstract
In this paper, we consider the problem of learning models with a latent factor structure. The focus is to find what is possible and what is impossible if the usual strong factor condition is not imposed. We study the minimax rate and adaptivity issues in two problems: pure factor models and panel regression with interactive fixed effects. For pure factor models, if the number of factors is known, we develop adaptive estimation and inference procedures that attain the minimax rate. However, when the number of factors is not specified a priori, we show that there is a tradeoff between validity and efficiency: any confidence interval that has uniform validity for arbitrary factor strength has to be conservative; in particular its width is bounded away from zero even when the factors are strong. Conversely, any data-driven confidence interval that does not require as an input the exact number of factors (including weak ones) and has shrinking width under strong factors does not have uniform coverage and the worst-case coverage probability is at most 1/2. For panel regressions with interactive fixed effects, the tradeoff is much better. We find that the minimax rate for learning the regression coefficient does not depend on the factor strength and propose a simple estimator that achieves this rate. However, when weak factors are allowed, uncertainty in the number of factors can cause a great loss of efficiency although the rate is not affected. In most cases, we find that the strong factor condition (and/or exact knowledge of number of factors) improves efficiency, but this condition needs to be imposed by faith and cannot be verified in data for inference purposes.
The problem of estimating the effect of missing higher orders in perturbation theory is analyzed with emphasis in the application to Higgs production in gluon-gluon fusion. Well-known mathematical methods for an approximated completion of the perturbative series are applied with the goal to not truncate the series, but complete it in a well-defined way, so as to increase the accuracy - if not the precision - of theoretical predictions. The uncertainty arising from the use of the completion procedure is discussed and a recipe for constructing a corresponding probability distribution function is proposed.
The field of extrasolar planets has rapidly expanded to include the detection of planets with masses smaller than that of Uranus. Many of these are expected to have little or no hydrogen and helium gas and we might find Earth analogs among them. In this paper we describe our detailed interior models for a rich variety of such massive terrestrial and ocean planets in the 1-to-10 earth-mass range (super-Earths). The grid presented here allows the characterization of the bulk composition of super-Earths detected in transit and with a measured mass. We show that, on average, planet radius measurements to better than 5%, combined with mass measurements to better than 10% would permit us to distinguish between an icy or rocky composition. This is due to the fact that there is a maximum radius a rocky terrestrial planet may achieve for a given mass. Any value of the radius above this maximum terrestrial radius implies that the planet contains a large (> 10%) amount of water (ocean planet).
The galaxy power spectrum is one of the central quantities in cosmology. It contains information about the primordial inflationary process, the matter clustering, the baryon-photon interaction, the effects of gravity, the galaxy-matter bias, the cosmic expansion, the peculiar velocity field, etc.. Most of this information is however difficult to extract without assuming a specific cosmological model, for instance $Lambda$CDM and standard gravity. In this paper we explore instead how much information can be obtained that is independent of the cosmological model, both at background and linear perturbation level. We determine the full set of model-independent statistics that can be constructed by combining two redshift bins and two distinct tracers. We focus in particular on the statistics $r(k,z_1,z_2)$, defined as the ratio of $fsigma_8(z)$ at two redshift shells, and we show how to estimate it with a Fisher matrix approach. Finally, we forecast the constraints on $r$ that can be achieved by future galaxy surveys, and compare it with the standard single-tracer result. We find that $r$ can be measured with a precision from 3 to 11%, depending on the survey. Using two tracers, we find improvements in the constraints up to a factor of two.
Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large real-life data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or specific training example? More precisely, we investigate learning algorithms that satisfy differential privacy, a notion that provides strong confidentiality guarantees in contexts where aggregate information is released about a database containing sensitive information about individuals. We demonstrate that, ignoring computational constraints, it is possible to privately agnostically learn any concept class using a sample size approximately logarithmic in the cardinality of the concept class. Therefore, almost anything learnable is learnable privately: specifically, if a concept class is learnable by a (non-private) algorithm with polynomial sample complexity and output size, then it can be learned privately using a polynomial number of samples. We also present a computationally efficient private PAC learner for the class of parity functions. Local (or randomized response) algorithms are a practical class of private algorithms that have received extensive investigation. We provide a precise characterization of local private learning algorithms. We show that a concept class is learnable by a local algorithm if and only if it is learnable in the statistical query (SQ) model. Finally, we present a separation between the power of interactive and noninteractive local learning algorithms.
A general class of time-varying regression models is considered in this paper. We estimate the regression coefficients by using local linear M-estimation. For these estimators, weak Bahadur representations are obtained and are used to construct simultaneous confidence bands. For practical implementation, we propose a bootstrap based method to circumvent the slow logarithmic convergence of the theoretical simultaneous bands. Our results substantially generalize and unify the treatments for several time-varying regression and auto-regression models. The performance for ARCH and GARCH models is studied in simulations and a few real-life applications of our study are presented through analysis of some popular financial datasets.