The Software Heritage Graph Dataset: Large-scale Analysis of Public Software Development History

74 0 0.0 ( 0 )

Download Cite

Added by Stefano Zacchiroli

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Antoine Pietri - Stefano Zacchirolin (UP

Software Engineering

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Software Heritage is the largest existing public archive of software source code and accompanying development history. It spans more than five billion unique source code files and one billion unique commits , coming from more than 80 million software projects. These software artifacts were retrieved from major collaborative development platforms (e.g., GitHub, GitLab) and package repositories (e.g., PyPI, Debian, NPM), and stored in a uniform representation linking together source code files, directories, commits, and full snapshots of version control systems (VCS) repositories as observed by Software Heritage during periodic crawls. This dataset is unique in terms of accessibility and scale, and allows to explore a number of research questions on the long tail of public software development, instead of solely focusing on most starred repositories as it often happens.

rate research

The Software Heritage Filesystem (SwhFS): Integrating Source Code Archival with Development

94 - Thibault Allanc{c}on , Stefanon Zacchiroli (UP 2021

We introduce the Software Heritage filesystem (SwhFS), a user-space filesystem that integrates large-scale open source software archival with development workflows. SwhFS provides a POSIX filesystem view of Software Heritage, the largest public archive of software source code and version control system (VCS) development history.Using SwhFS, developers can quickly checkout any of the 2 billion commits archived by Software Heritage, even after they disappear from their previous known location and without incurring the performance cost of repository cloning. SwhFS works across unrelated repositories and different VCS technologies. Other source code artifacts archived by Software Heritage-individual source code files and trees, releases, and branches-can also be accessed using common programming tools and custom scripts, as if they were locally available.A screencast of SwhFS is available online at dx.doi.org/10.5281/zenodo.4531411.

Software Engineering

An Ontological Analysis of a Proposed Theory for Software Development

384 - Diana Kirk , Stephen G. MacDonell 2021

There is growing acknowledgement within the software engineering community that a theory of software development is needed to integrate the myriad methodologies that are currently popular, some of which are based on opposing perspectives. We have been developing such a theory for a number of years. In this paper, we overview our theory and report on a recent ontological analysis of the theory constructs. We suggest that, once fully developed, this theory, or one similar to it, may be applied to support situated software development, by providing an overarching model within which software initiatives might be categorised and understood. Such understanding would inevitably lead to greater predictability with respect to outcomes.

Software Engineering

Secure Software Development in the Era of Fluid Multi-party Open Software and Services

112 - Ivan Pashchenko 2021

Pushed by market forces, software development has become fast-paced. As a consequence, modern development projects are assembled from 3rd-party components. Security & privacy assurance techniques once designed for large, controlled updates over months or years, must now cope with small, continuous changes taking place within a week, and happening in sub-components that are controlled by third-party developers one might not even know they existed. In this paper, we aim to provide an overview of the current software security approaches and evaluate their appropriateness in the face of the changed nature in software development. Software security assurance could benefit by switching from a process-based to an artefact-based approach. Further, security evaluation might need to be more incremental, automated and decentralized. We believe this can be achieved by supporting mechanisms for lightweight and scalable screenings that are applicable to the entire population of software components albeit there might be a price to pay.

Software Engineering

Prototype of Fault Adaptive Embedded Software for Large-Scale Real-Time Systems

146 - Derek Messie 2005

This paper describes a comprehensive prototype of large-scale fault adaptive embedded software developed for the proposed Fermilab BTeV high energy physics experiment. Lightweight self-optimizing agents embedded within Level 1 of the prototype are responsible for proactive and reactive monitoring and mitigation based on specified layers of competence. The agents are self-protecting, detecting cascading failures using a distributed approach. Adaptive, reconfigurable, and mobile objects for reliablility are designed to be self-configuring to adapt automatically to dynamically changing environments. These objects provide a self-healing layer with the ability to discover, diagnose, and react to discontinuities in real-time processing. A generic modeling environment was developed to facilitate design and implementation of hardware resource specifications, application data flow, and failure mitigation strategies. Level 1 of the planned BTeV trigger system alone will consist of 2500 DSPs, so the number of components and intractable fault scenarios involved make it impossible to design an `expert system that applies traditional centralized mitigative strategies based on rules capturing every possible system state. Instead, a distributed reactive approach is implemented using the tools and methodologies developed by the Real-Time Embedded Systems group.

Software Engineering

Agile Risk Management for Multi-Cloud Software Development

129 - Victor Muntes-Mulero , Oscar Ripolles , Smrati Gupta 2020

Industry in all sectors is experiencing a profound digital transformation that puts software at the core of their businesses. In order to react to continuously changing user requirements and dynamic markets, companies need to build robust workflows that allow them to increase their agility in order to remain competitive. This increasingly rapid transformation, especially in domains like IoT or Cloud computing, poses significant challenges to guarantee high quality software, since dynamism and agile short-term planning reduce the ability to detect and manage risks. In this paper, we describe the main challenges related to managing risk in agile software development, building on the experience of more than 20 agile coaches operating continuously for 15 years with hundreds of teams in industries in all sectors. We also propose a framework to manage risks that considers those challenges and supports collaboration, agility, and continuous development. An implementation of that framework is then described in a tool that handles risks and mitigation actions associated with the development of multi-cloud applications. The methodology and the tool have been validated by a team of evaluators that were asked to consider its use in developing an urban smart mobility service and an airline flight scheduling system.

Software Engineering