No Arabic abstract
The arXiv is the most popular preprint repository in the world. Since its inception in 1991, the arXiv has allowed researchers to freely share publication-ready articles prior to formal peer review. The growth and the popularity of the arXiv emerged as a result of new technologies that made document creation and dissemination easy, and cultural practices where collaboration and data sharing were dominant. The arXiv represents a unique place in the history of research communication and the Web itself, however it has arguably changed very little since its creation. Here we look at the strengths and weaknesses of arXiv in an effort to identify what possible improvements can be made based on new technologies not previously available. Based on this, we argue that a modern arXiv might in fact not look at all like the arXiv of today.
We introduce ArGoT, a data set of mathematical terms extracted from the articles hosted on the arXiv website. A term is any mathematical concept defined in an article. Using labels in the articles source code and examples from other popular math websites, we mine all the terms in the arXiv data and compile a comprehensive vocabulary of mathematical terms. Each term can be then organized in a dependency graph by using the terms definitions and the arXivs metadata. Using both hyperbolic and standard word embeddings, we demonstrate how this structure is reflected in the texts vector representation and how they capture relations of entailment in mathematical concepts. This data set is part of an ongoing effort to align natural mathematical text with existing Interactive Theorem Prover Libraries (ITPs) of formally verified statements.
Novelty is an inherent part of innovations and discoveries. Such processes may be considered as an appearance of new ideas or as an emergence of atypical connections between the existing ones. The importance of such connections hints for investigation of innovations through network or graph representation in the space of ideas. In such representation, a graph node corresponds to the relevant concept (idea), whereas an edge between two nodes means that the corresponding concepts have been used in a common context. In this study we address the question about a possibility to identify the edges between existing concepts where the innovations may emerge. To this end, we use a well-documented scientific knowledge landscape of 1.2M arXiv.org manuscripts dated starting from April 2007 and until September 2019. We extract relevant concepts for them using the ScienceWISE.info platform. Combining approaches developed in complex networks science and graph embedding, we discuss the predictability of edges (links) on the scientific knowledge landscape where the innovations may appear.
The arXiv has collected 1.5 million pre-print articles over 28 years, hosting literature from scientific fields including Physics, Mathematics, and Computer Science. Each pre-print features text, figures, authors, citations, categories, and other metadata. These rich, multi-modal features, combined with the natural graph structure---created by citation, affiliation, and co-authorship---makes the arXiv an exciting candidate for benchmarking next-generation models. Here we take the first necessary steps toward this goal, by providing a pipeline which standardizes and simplifies access to the arXivs publicly available data. We use this pipeline to extract and analyze a 6.7 million edge citation graph, with an 11 billion word corpus of full-text research articles. We present some baseline classification results, and motivate application of more exciting generative graph models.
The LHC will probe the nature of the vacuum that determines the properties of particles and the forces between them. Of particular importance is the fact that our current theories allow the Universe to be trapped in a metastable vacuum, which may decay in the distant future, changing the nature of matter. This could be the case in the Standard Model if the LHC finds the Higgs boson to be light. Supersymmetry is one favoured extension of the Standard Model which one might invoke to try to avoid such instability. However, many supersymmetric models are also condemned to vacuum decay for different reasons. The LHC will be able to distinguish between different supersymmetric models, thereby testing the stability of the vacuum, and foretelling the fate of the Universe.
Below we analyze the `critic statements made in the Preprint arXiv:1301.1828v1 [nucl-th]. The doubtful scientific argumentation of the authors of the Preprint arXiv:1301.1828v1 [nucl-th] is also discussed.