أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Orion Penner

Inequality and cumulative advantage in science careers: a case study of high-impact journals

260 - Alexander M. Petersen , Orion Penner 2014

Analyzing a large data set of publications drawn from the most competitive journals in the natural and social sciences we show that research careers exhibit the broad distributions of individual achievement characteristic of systems in which cumulati ve advantage plays a key role. While most researchers are personally aware of the competition implicit in the publication process, little is known about the levels of inequality at the level of individual researchers. We analyzed both productivity and impact measures for a large set of researchers publishing in high-impact journals. For each researcher cohort we calculated Gini inequality coefficients, with average Gini values around 0.48 for total publications and 0.73 for total citations. For perspective, these observed values are well in excess of the inequality levels observed for personal income in developing countries. Investigating possible sources of this inequality, we identify two potential mechanisms that act at the level of the individual that may play defining roles in the emergence of the broad productivity and impact distributions found in science. First, we show that the average time interval between a researchers successive publications in top journals decreases with each subsequent publication. Second, after controlling for the time dependent features of citation distributions, we compare the citation impact of subsequent publications within a researchers publication record. We find that as researchers continue to publish in top journals, there is more likely to be a decreasing trend in the relative citation impact with each subsequent publication. This pattern highlights the difficulty of repeatedly publishing high-impact research and the intriguing possibility that confirmation bias plays a role in the evaluation of scientific careers.

الفيزياء والمجتمع المكتبات الرقمية

On the Predictability of Future Impact in Science

100 - Orion Penner , Raj Kumar Pan , Alexander M. Petersen 2013

Correctly assessing a scientists past research impact and potential for future impact is key in recruitment decisions and other evaluation processes. While a candidates future impact is the main concern for these decisions, most measures only quantif y the impact of previous work. Recently, it has been argued that linear regression models are capable of predicting a scientists future impact. By applying that future impact model to 762 careers drawn from three disciplines: physics, biology, and mathematics, we identify a number of subtle, but critical, flaws in current models. Specifically, cumulative non-decreasing measures like the h-index contain intrinsic autocorrelation, resulting in significant overestimation of their predictive power. Moreover, the predictive power of these models depend heavily upon scientists career age, producing least accurate estimates for young researchers. Our results place in doubt the suitability of such models, and indicate further investigation is required before they can be used in recruiting decisions.

الفيزياء والمجتمع المكتبات الرقمية

The case for caution in predicting scientists future impact

98 - Orion Penner , Raj K. Pan , Alexander M. Petersen 2013

We stress-test the career predictability model proposed by Acuna et al. [Nature 489, 201-202 2012] by applying their model to a longitudinal career data set of 100 Assistant professors in physics, two from each of the top 50 physics departments in th e US. The Acuna model claims to predict h(t+Delta t), a scientists h-index Delta t years into the future, using a linear combination of 5 cumulative career measures taken at career age t. Here we investigate how the predictability depends on the aggregation of career data across multiple age cohorts. We confirm that the Acuna model does a respectable job of predicting h(t+Delta t) up to roughly 6 years into the future when aggregating all age cohorts together. However, when calculated using subsets of specific age cohorts (e.g. using data for only t=3), we find that the models predictive power significantly decreases, especially when applied to early career years. For young careers, the model does a much worse job of predicting future impact, and hence, exposes a serious limitation. The limitation is particularly concerning as early career decisions make up a significant portion, if not the majority, of cases where quantitative approaches are likely to be applied.

الفيزياء والمجتمع المكتبات الرقمية تحليل البيانات والإحصاءات والاحتمال

Sequence alignment, mutual information, and dissimilarity measures for constructing phylogenies

123 - Orion Penner , Peter Grassberger , Maya Paczuski 2010

Existing sequence alignment algorithms use heuristic scoring schemes which cannot be used as objective distance metrics. Therefore one relies on measures like the p- or log-det distances, or makes explicit, and often simplistic, assumptions about seq uence evolution. Information theory provides an alternative, in the form of mutual information (MI) which is, in principle, an objective and model independent similarity measure. MI can be estimated by concatenating and zipping sequences, yielding thereby the normalized compression distance. So far this has produced promising results, but with uncontrolled errors. We describe a simple approach to get robust estimates of MI from global pairwise alignments. Using standard alignment algorithms, this gives for animal mitochondrial DNA estimates that are strikingly close to estimates obtained from the alignment free methods mentioned above. Our main result uses algorithmic (Kolmogorov) information theory, but we show that similar results can also be obtained from Shannon theory. Due to the fact that it is not additive, normalized compression distance is not an optimal metric for phylogenetics, but we propose a simple modification that overcomes the issue of additivity. We test sever

الجينوم

88 - Alexander M. Petersen , Orion Penner , H. Eugene Stanley 2010

There is a long standing debate over how to objectively compare the career achievements of professional athletes from different historical eras. Developing an objective approach will be of particular importance over the next decade as Major League Ba seball (MLB) players from the steroids era become eligible for Hall of Fame induction. Here we address this issue, as well as the general problem of comparing statistics from distinct eras, by detrending the seasonal statistics of professional baseball players. We detrend player statistics by normalizing achievements to seasonal averages, which accounts for changes in relative player ability resulting from both exogenous and endogenous factors, such as talent dilution from expansion, equipment and training improvements, as well as performance enhancing drugs (PED). In this paper we compare the probability density function (pdf) of detrended career statistics to the pdf of raw career statistics for five statistical categories -- hits (H), home runs (HR), runs batted in (RBI), wins (W) and strikeouts (K) -- over the 90-year period 1920-2009. We find that the functional form of these pdfs are stationary under detrending. This stationarity implies that the statistical regularity observed in the right-skewed distributions for longevity and success in professional baseball arises from both the wide range of intrinsic talent among athletes and the underlying nature of competition. Using this simple detrending technique, we examine the top 50 all-time careers for H, HR, RBI, W and K. We fit the pdfs for career success by the Gamma distribution in order to calculate objective benchmarks based on extreme statistics which can be used for the identification of extraordinary careers.

الفيزياء والمجتمع تحليل البيانات والإحصاءات والاحتمال الفيزياء الشعبية

Sequence alignment and mutual information

229 - Orion Penner , Peter Grassberger , 2008

Background: Alignment of biological sequences such as DNA, RNA or proteins is one of the most widely used tools in computational bioscience. All existing alignment algorithms rely on heuristic scoring schemes based on biological expertise. Therefore, these algorithms do not provide model independent and objective measures for how similar two (or more) sequences actually are. Although information theory provides such a similarity measure -- the mutual information (MI) -- previous attempts to connect sequence alignment and information theory have not produced realistic estimates for the MI from a given alignment. Results: Here we describe a simple and flexible approach to get robust estimates of MI from {it global} alignments. For mammalian mitochondrial DNA, our approach gives pairwise MI estimates for commonly used global alignment algorithms that are strikingly close to estimates obtained by an entirely unrelated approach -- concatenating and zipping the sequences. Conclusions: This remarkable consistency may help establish MI as a reliable tool for evaluating the quality of global alignments, judging the relative merits of different alignment algorithms, and estimating the significance of specific alignments. We expect that our approach can be extended to establish further connections between information theory and sequence alignment, including applications to local and multiple alignment procedures.

الجينوم

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد