Do you want to publish a course? Click here

Empirical Bayes approaches to PageRank type algorithms for rating scientific journals

127   0   0.0 ( 0 )
 Added by Julie Josse
 Publication date 2017
and research's language is English




Ask ChatGPT about the research

Following criticisms against the journal Impact Factor, new journal influence scores have been developed such as the Eigenfactor or the Prestige Scimago Journal Rank. They are based on PageRank type algorithms on the cross-citations transition matrix of the citing-cited network. The PageRank algorithm performs a smoothing of the transition matrix combining a random walk on the data network and a teleportation to all possible nodes with fixed probabilities (the damping factor being $alpha= 0.85$). We reinterpret this smoothing matrix as the mean of a posterior distribution of a Dirichlet-multinomial model in an empirical Bayes perspective. We suggest a simple yet efficient way to make a clear distinction between structural and sampling zeroes. This allows us to contrast cases with self-citations included or excluded to avoid overvalued journal bias. We estimate the model parameters by maximizing the marginal likelihood with a Majorize-Minimize algorithm. The procedure ends up with a score similar to the PageRank ones but with a damping factor depending on each journal. The procedures are illustrated with an example about cross-citations among 47 statistical journals studied by Varin et. al. (2016).



rate research

Read More

120 - Xiuwen Duan 2021
Empirical Bayes methods have been around for a long time and have a wide range of applications. These methods provide a way in which historical data can be aggregated to provide estimates of the posterior mean. This thesis revisits some of the empirical Bayesian methods and develops new applications. We first look at a linear empirical Bayes estimator and apply it on ranking and symbolic data. Next, we consider Tweedies formula and show how it can be applied to analyze a microarray dataset. The application of the formula is simplified with the Pearson system of distributions. Saddlepoint approximations enable us to generalize several results in this direction. The results show that the proposed methods perform well in applications to real data sets.
The simultaneous estimation of many parameters $eta_i$, based on a corresponding set of observations $x_i$, for $i=1,ldots, n$, is a key research problem that has received renewed attention in the high-dimensional setting. %The classic example involves estimating a vector of normal means $mu_i$ subject to a fixed variance term $sigma^2$. However, Many practical situations involve heterogeneous data $(x_i, theta_i)$ where $theta_i$ is a known nuisance parameter. Effectively pooling information across samples while correctly accounting for heterogeneity presents a significant challenge in large-scale estimation problems. We address this issue by introducing the Nonparametric Empirical Bayes Smoothing Tweedie (NEST) estimator, which efficiently estimates $eta_i$ and properly adjusts for heterogeneity %by approximating the marginal density of the data $f_{theta_i}(x_i)$ and applying this density to via a generalized version of Tweedies formula. NEST is capable of handling a wider range of settings than previously proposed heterogeneous approaches as it does not make any parametric assumptions on the prior distribution of $eta_i$. The estimation framework is simple but general enough to accommodate any member of the exponential family of distributions. %; a thorough study of the normal means problem subject to heterogeneous variances is presented to illustrate the proposed framework. Our theoretical results show that NEST is asymptotically optimal, while simulation studies show that it outperforms competing methods, with substantial efficiency gains in many settings. The method is demonstrated on a data set measuring the performance gap in math scores between socioeconomically advantaged and disadvantaged students in K-12 schools.
Rank data arises frequently in marketing, finance, organizational behavior, and psychology. Most analysis of rank data reported in the literature assumes the presence of one or more variables (sometimes latent) based on whose values the items are ranked. In this paper we analyze rank data using a purely probabilistic model where the observed ranks are assumed to be perturbe
Scientific journals are the repositories of the gradually accumulating knowledge of mankind about the world surrounding us. Just as our knowledge is organised into classes ranging from major disciplines, subjects and fields to increasingly specific topics, journals can also be categorised into groups using various metrics. In addition to the set of topics characteristic for a journal, they can also be ranked regarding their relevance from the point of overall influence. One widespread measure is impact factor, but in the present paper we intend to reconstruct a much more detailed description by studying the hierarchical relations between the journals based on citation data. We use a measure related to the notion of m-reaching centrality and find a network which shows the level of influence of a journal from the point of the direction and efficiency with which information spreads through the network. We can also obtain an alternative network using a suitably modified nested hierarchy extraction method applied to the same data. The results are weakly methodology-dependent and reveal non-trivial relations among journals. The two alternative hierarchies show large similarity with some striking differences, providing together a complex picture of the intricate relations between scientific journals.
Nonparametric empirical Bayes methods provide a flexible and attractive approach to high-dimensional data analysis. One particularly elegant empirical Bayes methodology, involving the Kiefer-Wolfowitz nonparametric maximum likelihood estimator (NPMLE) for mixture models, has been known for decades. However, implementation and theoretical analysis of the Kiefer-Wolfowitz NPMLE are notoriously difficult. A fast algorithm was recently proposed that makes NPMLE-based procedures feasible for use in large-scale problems, but the algorithm calculates only an approximation to the NPMLE. In this paper we make two contributions. First, we provide upper bounds on the convergence rate of the approximate NPMLEs statistical error, which have the same order as the best known bounds for the true NPMLE. This suggests that the approximate NPMLE is just as effective as the true NPMLE for statistical applications. Second, we illustrate the promise of NPMLE procedures in a high-dimensional binary classification problem. We propose a new procedure and show that it vastly outperforms existing methods in experiments with simulated data. In real data analyses involving cancer survival and gene expression data, we show that it is very competitive with several recently proposed methods for regularized linear discriminant analysis, another popular approach to high-dimensional classification.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا