ﻻ يوجد ملخص باللغة العربية
Under the environment of big data streams, it is a common situation where the variable set of a model may change according to the condition of data streams. In this paper, we propose a homogenization strategy to represent the heterogenous models that are gradually updated in the process of data streams. With the homogenized representations, we can easily construct various online updating statistics such as parameter estimation, residual sum of squares and $F$-statistic for the heterogenous updating regression models. The main difference from the classical scenarios is that the artificial covariates in the homogenized models are not identically distributed as the natural covariates in the original models, consequently, the related theoretical properties are distinct from the classical ones. The asymptotical properties of the online updating statistics are established, which show that the new method can achieve estimation efficiency and oracle property, without any constraint on the number of data batches. The behavior of the method is further illustrated by various numerical examples from simulation experiments.
This paper establishes unified frameworks of renewable weighted sums (RWS) for various online updating estimations in the models with streaming data sets. The newly defined RWS lays the foundation of online updating likelihood, online updating loss f
This paper discusses an alternative to conditioning that may be used when the probability distribution is not fully specified. It does not require any assumptions (such as CAR: coarsening at random) on the unknown distribution. The well-known Monty H
In this paper we derive an updating scheme for calculating some important network statistics such as degree, clustering coefficient, etc., aiming at reduce the amount of computation needed to track the evolving behavior of large networks; and more im
Online image hashing has received increasing research attention recently, which processes large-scale data in a streaming fashion to update the hash functions on-the-fly. To this end, most existing works exploit this problem under a supervised settin
In the research field of big data, one of important issues is how to recover the sequentially changing sets of true features when the data sets arrive sequentially. The paper presents a general framework for online updating variable selection and par