ﻻ يوجد ملخص باللغة العربية
This paper develops an incremental learning algorithm based on quadratic inference function (QIF) to analyze streaming datasets with correlated outcomes such as longitudinal data and clustered data. We propose a renewable QIF (RenewQIF) method within a paradigm of renewable estimation and incremental inference, in which parameter estimates are recursively renewed with current data and summary statistics of historical data, but with no use of any historical subject-level raw data. We compare our renewable estimation method with both offline QIF and offline generalized estimating equations (GEE) approach that process the entire cumulative subject-level data, and show theoretically and numerically that our renewable procedure enjoys statistical and computational efficiency. We also propose an approach to diagnose the homogeneity assumption of regression coefficients via a sequential goodness-of-fit test as a screening procedure on occurrences of abnormal data batches. We implement the proposed methodology by expanding existing Sparks Lambda architecture for the operation of statistical inference and data quality diagnosis. We illustrate the proposed methodology by extensive simulation studies and an analysis of streaming car crash datasets from the National Automotive Sampling System-Crashworthiness Data System (NASS CDS).
While experiments on fusion plasmas produce high-dimensional data time series with ever increasing magnitude and velocity, data analysis has been lagging behind this development. For example, many data analysis tasks are often performed in a manual,
Correlated data are ubiquitous in todays data-driven society. A fundamental task in analyzing these data is to understand, characterize and utilize the correlations in them in order to conduct valid inference. Yet explicit regression analysis of corr
A population-averaged additive subdistribution hazard model is proposed to assess the marginal effects of covariates on the cumulative incidence function to analyze correlated failure time data subject to competing risks. This approach extends the po
Spatial regression or geographically weighted regression models have been widely adopted to capture the effects of auxiliary information on a response variable of interest over a region. In contrast, relationships between response and auxiliary varia
This paper investigates the problem of making inference about a parametric model for the regression of an outcome variable $Y$ on covariates $(V,L)$ when data are fused from two separate sources, one which contains information only on $(V, Y)$ while