ﻻ يوجد ملخص باللغة العربية
GitHub has become an important platform for code sharing and scientific exchange. With the massive number of repositories available, there is a pressing need for topic-based search. Even though the topic label functionality has been introduced, the majority of GitHub repositories do not have any labels, impeding the utility of search and topic-based analysis. This work targets the automatic repository classification problem as textit{keyword-driven hierarchical classification}. Specifically, users only need to provide a label hierarchy with keywords to supply as supervision. This setting is flexible, adaptive to the users needs, accounts for the different granularity of topic labels and requires minimal human effort. We identify three key challenges of this problem, namely (1) the presence of multi-modal signals; (2) supervision scarcity and bias; (3) supervision format mismatch. In recognition of these challenges, we propose the textsc{HiGitClass} framework, comprising of three modules: heterogeneous information network embedding; keyword enrichment; topic modeling and pseudo document generation. Experimental results on two GitHub repository collections confirm that textsc{HiGitClass} is superior to existing weakly-supervised and dataless hierarchical classification methods, especially in its ability to integrate both structured and unstructured data for repository classification.
Software development is becoming increasingly open and collaborative with the advent of platforms such as GitHub. Given its crucial role, there is a need to better understand and model the dynamics of GitHub as a social platform. Previous work has mo
The Institutional Repositories (IR) have been consolidated into the institutions in scientific and academic areas, as shown by the directories existing open access repositories and the deposits daily of articles made by different ways, such as by sel
This report is a high-level summary analysis of the 2017 GitHub Open Source Survey dataset, presenting frequency counts, proportions, and frequency or proportion bar plots for every question asked in the survey.
When a group of people strives to understand new information, struggle ensues as various ideas compete for attention. Steep learning curves are surmounted as teams learn together. To understand how these team dynamics play out in software development
It has been proved that deep neural networks are facing a new threat called backdoor attacks, where the adversary can inject backdoors into the neural network model through poisoning the training dataset. When the input containing some special patter