بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Distributed sequential method for analyzing massive data

110 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yuan-chin Ivan Chang

تاريخ النشر 2018

مجال البحث الاحصاء الرياضي

والبحث باللغة English

تأليف Zhanfeng Wang - Yuan-chin Ivan Chang

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

To analyse a very large data set containing lengthy variables, we adopt a sequential estimation idea and propose a parallel divide-and-conquer method. We conduct several conventional sequential estimation procedures separately, and properly integrate their results while maintaining the desired statistical properties. Additionally, using a criterion from the statistical experiment design, we adopt an adaptive sample selection, together with an adaptive shrinkage estimation method, to simultaneously accelerate the estimation procedure and identify the effective variables. We confirm the cogency of our methods through theoretical justifications and numerical results derived from synthesized data sets. We then apply the proposed method to three real data sets, including those pertaining to appliance energy use and particulate matter concentration.

قيم البحث

72 - Jun Yu , HaiYing Wang , Mingyao Ai 2020

Nonuniform subsampling methods are effective to reduce computational burden and maintain estimation efficiency for massive data. Existing methods mostly focus on subsampling with replacement due to its high computational efficiency. If the data volum e is so large that nonuniform subsampling probabilities cannot be calculated all at once, then subsampling with replacement is infeasible to implement. This paper solves this problem using Poisson subsampling. We first derive optimal Poisson subsampling probabilities in the context of quasi-likelihood estimation under the A- and L-optimality criteria. For a practically implementable algorithm with approximated optimal subsampling probabilities, we establish the consistency and asymptotic normality of the resultant estimators. To deal with the situation that the full data are stored in different blocks or at multiple locations, we develop a distributed subsampling framework, in which statistics are computed simultaneously on smaller partitions of the full data. Asymptotic properties of the resultant aggregated estimator are investigated. We illustrate and evaluate the proposed strategies through numerical experiments on simulated and real data sets.

المنهجية النظم الموزعة والتوازية والحوسبة العنقودية حساب

Adaptive Variable Selection for Sequential Prediction in Multivariate Dynamic Models

92 - Isaac Lavine , Michael Lindon , 2019

We discuss Bayesian model uncertainty analysis and forecasting in sequential dynamic modeling of multivariate time series. The perspective is that of a decision-maker with a specific forecasting objective that guides thinking about relevant models. B ased on formal Bayesian decision-theoretic reasoning, we develop a time-adaptive approach to exploring, weighting, combining and selecting models that differ in terms of predictive variables included. The adaptivity allows for changes in the sets of favored models over time, and is guided by the specific forecasting goals. A synthetic example illustrates how decision-guided variable selection differs from traditional Bayesian model uncertainty analysis and standard model averaging. An applied study in one motivating application of long-term macroeconomic forecasting highlights the utility of the new approach in terms of improving predictions as well as its ability to identify and interpret different sets of relevant models over time with respect to specific, defined forecasting goals.

المنهجية تطبيقات الإحصاء حساب

Bayesian Non-Parametric Inference for Infectious Disease Data

376 - Edward S. Knock , Theodore Kypraios 2014

We propose a framework for Bayesian non-parametric estimation of the rate at which new infections occur assuming that the epidemic is partially observed. The developed methodology relies on modelling the rate at which new infections occur as a functi on which only depends on time. Two different types of prior distributions are proposed namely using step-functions and B-splines. The methodology is illustrated using both simulated and real datasets and we show that certain aspects of the epidemic such as seasonality and super-spreading events are picked up without having to explicitly incorporate them into a parametric model.

المنهجية تطبيقات الإحصاء حساب

Bayesian nonstationary and nonparametric covariance estimation for large spatial data

78 - Brian Kidd , Matthias Katzfuss 2020

In spatial statistics, it is often assumed that the spatial field of interest is stationary and its covariance has a simple parametric form, but these assumptions are not appropriate in many applications. Given replicate observations of a Gaussian sp atial field, we propose nonstationary and nonparametric Bayesian inference on the spatial dependence. Instead of estimating the quadratic (in the number of spatial locations) entries of the covariance matrix, the idea is to infer a near-linear number of nonzero entries in a sparse Cholesky factor of the precision matrix. Our prior assumptions are motivated by recent results on the exponential decay of the entries of this Cholesky factor for Matern-type covariances under a specific ordering scheme. Our methods are highly scalable and parallelizable. We conduct numerical comparisons and apply our methodology to climate-model output, enabling statistical emulation of an expensive physical model.

المنهجية تطبيقات الإحصاء حساب

A tractable Bayesian joint model for longitudinal and survival data

80 - Danilo Alvares , Francisco Javier Rubio 2021

We introduce a numerically tractable formulation of Bayesian joint models for longitudinal and survival data. The longitudinal process is modelled using generalised linear mixed models, while the survival process is modelled using a parametric genera l hazard structure. The two processes are linked by sharing fixed and random effects, separating the effects that play a role at the time scale from those that affect the hazard scale. This strategy allows for the inclusion of non-linear and time-dependent effects while avoiding the need for numerical integration, which facilitates the implementation of the proposed joint model. We explore the use of flexible parametric distributions for modelling the baseline hazard function which can capture the basic shapes of interest in practice. We discuss prior elicitation based on the interpretation of the parameters. We present an extensive simulation study, where we analyse the inferential properties of the proposed models, and illustrate the trade-off between flexibility, sample size, and censoring. We also apply our proposal to two real data applications in order to demonstrate the adaptability of our formulation both in univariate time-to-event data and in a competing risks framework. The methodology is implemented in rstan.

المنهجية تطبيقات الإحصاء حساب

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

معھد الشام العالي للعلوم الشرعية واللغة العربية والدراسات والبحوث الإسلامية

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Distributed sequential method for analyzing massive data

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً