Do you want to publish a course? Click here

Collaborative work on unstructured or semi-structured documents, such as in literature corpora or source code, often involves agreed upon templates containing metadata. These templates are not consistent across users and over time. Rule-based parsing of these templates is expensive to maintain and tends to fail as new documents are added. Statistical techniques based on frequent occurrences have the potential to identify automatically a large fraction of the templates, thus reducing the burden on the programmers. We investigate the case of the Project Gutenberg corpus, where most documents are in ASCII format with preambles and epilogues that are often copied and pasted or manually typed. We show that a statistical approach can solve most cases though some documents require knowledge of English. We also survey various technical solutions that make our approach applicable to large data sets.
146 - Zewen Hu , Yishan Wu 2015
Empirical analysis results about the possible causes leading to non-citation may help increase the potential of researchers work to be cited and editorial staffs of journals to identify contributions with potential high quality. In this study, we conduct a survey on the possible causes leading to citation or non-citation based on a questionnaire. We then perform a statistical analysis to identify the major causes leading to non-citation in combination with the analysis on the data collected through the survey. Most respondents to our questionnaire identified eight major causes that facilitate easy citation of ones papers, such as research hotspots and novel topics of content, longer intervals after publication, research topics similar to my work, high quality of content, reasonable self-citation, highlighted title, prestigious authors, academic tastes and interests similar to mine.They also pointed out that the vast difference between their current and former research directions as the primary reason for their previously uncited papers. They feel that text that includes notes, comments, and letters to editors are rarely cited, and the same is true for too short or too lengthy papers. In comparison, it is easier for reviews, articles, or papers of intermediate length to be cited.
Scientific journals are the repositories of the gradually accumulating knowledge of mankind about the world surrounding us. Just as our knowledge is organised into classes ranging from major disciplines, subjects and fields to increasingly specific topics, journals can also be categorised into groups using various metrics. In addition to the set of topics characteristic for a journal, they can also be ranked regarding their relevance from the point of overall influence. One widespread measure is impact factor, but in the present paper we intend to reconstruct a much more detailed description by studying the hierarchical relations between the journals based on citation data. We use a measure related to the notion of m-reaching centrality and find a network which shows the level of influence of a journal from the point of the direction and efficiency with which information spreads through the network. We can also obtain an alternative network using a suitably modified nested hierarchy extraction method applied to the same data. The results are weakly methodology-dependent and reveal non-trivial relations among journals. The two alternative hierarchies show large similarity with some striking differences, providing together a complex picture of the intricate relations between scientific journals.
The ADS platform is undergoing the biggest rewrite of its 20-year history. While several components have been added to its architecture over the past couple of years, this talk will concentrate on the underpinnings of ADSs search layer and its API. To illustrate the design of the components in the new system, we will show how the new ADS user interface is built exclusively on top of the API using RESTful web services. Taking one step further, we will discuss how we plan to expose the treasure trove of information hosted by ADS (10 million records and fulltext for much of the Astronomy and Physics refereed literature) to partners interested in using this API. This will provide you (and your intelligent applications) with access to ADSs underlying data to enable the extraction of new knowledge and the ingestion of these results back into the ADS. Using this framework, researchers could run controlled experiments with content extraction, machine learning, natural language processing, etc. In this talk, we will discuss what is already implemented, what will be available soon, and where we are going next.
We revisit our recent study [Predicting results of the Research Excellence Framework using departmental h-index, Scientometrics, 2014, 1-16; arXiv:1411.1996] in which we attempted to predict outcomes of the UKs Research Excellence Framework (REF~2014) using the so-called departmental $h$-index. Here we report that our predictions failed to anticipate with any accuracy either overall REF outcomes or movements of individual institutions in the rankings relative to their positions in the previous Research Assessment Exercise (RAE~2008).
158 - Jian Du , Xiaoli Tang , 2015
F1000 recommendations have been validated as a potential data source for research evaluation, but reasons for differences between F1000 Article Factor (FFa scores) and citations remain to be explored. By linking 28254 publications in F1000 to citations in Scopus, we investigated the effect of research level and article type on the internal consistency of assessments based on citations and FFa scores. It turns out that research level has little impact, while article type has big effect on the differences. These two measures are significantly different for two groups: non-primary research or evidence-based research publications are more highly cited rather than highly recommended, however, translational research or transformative research publications are more highly recommended by faculty members but gather relatively lower citations. This can be expected because citation activities are usually practiced by academic authors while the potential for scientific revolutions and the suitability for clinical practice of an article should be investigated from the practitioners points of view. We conclude with a policy relevant recommendation that the application of bibliometric approaches in research evaluation procedures should include the proportion of three types of publications: evidence-based research, transformative research, and translational research. The latter two types are more suitable to be assessed through peer review.
Analyzing a large data set of publications drawn from the most competitive journals in the natural and social sciences we show that research careers exhibit the broad distributions of individual achievement characteristic of systems in which cumulative advantage plays a key role. While most researchers are personally aware of the competition implicit in the publication process, little is known about the levels of inequality at the level of individual researchers. We analyzed both productivity and impact measures for a large set of researchers publishing in high-impact journals. For each researcher cohort we calculated Gini inequality coefficients, with average Gini values around 0.48 for total publications and 0.73 for total citations. For perspective, these observed values are well in excess of the inequality levels observed for personal income in developing countries. Investigating possible sources of this inequality, we identify two potential mechanisms that act at the level of the individual that may play defining roles in the emergence of the broad productivity and impact distributions found in science. First, we show that the average time interval between a researchers successive publications in top journals decreases with each subsequent publication. Second, after controlling for the time dependent features of citation distributions, we compare the citation impact of subsequent publications within a researchers publication record. We find that as researchers continue to publish in top journals, there is more likely to be a decreasing trend in the relative citation impact with each subsequent publication. This pattern highlights the difficulty of repeatedly publishing high-impact research and the intriguing possibility that confirmation bias plays a role in the evaluation of scientific careers.
173 - Amir Razmjou 2014
In previous studies, much attention from multidisciplinary fields has been devoted to understand the mechanism of underlying scholarly networks including bibliographic networks, citation networks and co-citation networks. Particularly focusing on networks constructed by means of either authors affinities or the mutual content. Missing a valuable dimension of network, which is an audience scholarly paper. We aim at this paper to assess the impact that social networks and media can have on scholarly papers. We also examine the process of information flow in such networks. We also mention some observa- tions of attractive incidents that our proposed network model revealed.
233 - Rob Seaman 2014
From the moment astronomical observations are made the resulting data products begin to grow stale. Even if perfect binary copies are preserved through repeated timely migration to more robust storage media, data standards evolve and new tools are created that require different kinds of data or metadata. The expectations of the astronomical community change even if the data do not. We discuss data engineering to mitigate the ensuing risks with examples from a recent project to refactor seven million archival images to new standards of nomenclature, metadata, format, and compression.
We demonstrate a comprehensive framework that accounts for citation dynamics of scientific papers and for the age distribution of references. We show that citation dynamics of scientific papers is nonlinear and this nonlinearity has far-reaching consequences, such as diverging citation distributions and runaway papers. We propose a nonlinear stochastic dynamic model of citation dynamics based on link copying/redirection mechanism. The model is fully calibrated by empirical data and does not contain free parameters. This model can be a basis for quantitative probabilistic prediction of citation dynamics of individual papers and of the journal impact factor.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا