Consistency of the PLFit estimator for power-law data


الملخص بالإنكليزية

We prove the consistency of the Power-Law Fit PLFit method proposed by Clauset et al.(2009) to estimate the power-law exponent in data coming from a distribution function with regularly-varying tail. In the complex systems community, PLFit has emerged as the method of choice to estimate the power-law exponent. Yet, its mathematical properties are still poorly understood. The difficulty in PLFit is that it is a minimum-distance estimator. It first chooses a threshold that minimizes the Kolmogorov-Smirnov distance between the data points larger than the threshold and the Pareto tail, and then applies the Hill estimator to this restricted data. Since the number of order statistics used is random, the general theory of consistency of power-law exponents from extreme value theory does not apply. Our proof consists in first showing that the Hill estimator is consistent for general intermediate sequences for the number of order statistics used, even when that number is random. Here, we call a sequence intermediate when it grows to infinity, while remaining much smaller than the sample size. The second, and most involved, step is to prove that the optimizer in PLFit is with high probability an intermediate sequence, unless the distribution has a Pareto tail above a certain value. For the latter special case, we give a separate proof.

تحميل البحث