Stack Overflow in Github: Any Snippets There?

75 0 0.0 ( 0 )

Download Cite

Added by Di Yang

Publication date 2017

fields Informatics Engineering

and research's language is English

Authors Di Yang - Pedro Martins - Vaibhav Saini

Software Engineering

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

When programmers look for how to achieve certain programming tasks, Stack Overflow is a popular destination in search engine results. Over the years, Stack Overflow has accumulated an impressive knowledge base of snippets of code that are amply documented. We are interested in studying how programmers use these snippets of code in their projects. Can we find Stack Overflow snippets in real projects? When snippets are used, is this copy literal or does it suffer adaptations? And are these adaptations specializations required by the idiosyncrasies of the target artifact, or are they motivated by specific requirements of the programmer? The large-scale study presented on this paper analyzes 909k non-fork Python projects hosted on Github, which contain 290M function definitions, and 1.9M Python snippets captured in Stack Overflow. Results are presented as quantitative analysis of block-level code cloning intra and inter Stack Overflow and GitHub, and as an analysis of programming behaviors through the qualitative analysis of our findings.

rate research

Generating Question Titles for Stack Overflow from Mined Code Snippets

91 - Zhipeng Gao , Xin Xia , John Grundy 2020

Stack Overflow has been heavily used by software developers as a popular way to seek programming-related information from peers via the internet. The Stack Overflow community recommends users to provide the related code snippet when they are creating a question to help others better understand it and offer their help. Previous studies have shown that} a significant number of these questions are of low-quality and not attractive to other potential experts in Stack Overflow. These poorly asked questions are less likely to receive useful answers and hinder the overall knowledge generation and sharing process. Considering one of the reasons for introducing low-quality questions in SO is that many developers may not be able to clarify and summarize the key problems behind their presented code snippets due to their lack of knowledge and terminology related to the problem, and/or their poor writing skills, in this study we propose an approach to assist developers in writing high-quality questions by automatically generating question titles for a code snippet using a deep sequence-to-sequence learning approach. Our approach is fully data-driven and uses an attention mechanism to perform better content selection, a copy mechanism to handle the rare-words problem and a coverage mechanism to eliminate word repetition problem. We evaluate our approach on Stack Overflow datasets over a variety of programming languages (e.g., Python, Java, Javascript, C# and SQL) and our experimental results show that our approach significantly outperforms several state-of-the-art baselines in both automatic and human evaluation. We have released our code and datasets to facilitate other researchers to verify their ideas and inspire the follow-up work.

Software Engineering

Code2Que: A Tool for Improving Question Titles from Mined Code Snippets in Stack Overflow

330 - Zhipeng Gao , Xin Xia , David Lo 2020

Stack Overflow is one of the most popular technical Q&A sites used by software developers. Seeking help from Stack Overflow has become an essential part of software developers daily work for solving programming-related questions. Although the Stack Overflow community has provided quality assurance guidelines to help users write better questions, we observed that a significant number of questions submitted to Stack Overflow are of low quality. In this paper, we introduce a new web-based tool, Code2Que, which can help developers in writing higher quality questions for a given code snippet. Code2Que consists of two main stages: offline learning and online recommendation. In the offline learning phase, we first collect a set of good quality <code snippet, question> pairs as training samples. We then train our model on these training samples via a deep sequence-to-sequence approach, enhanced with an attention mechanism, a copy mechanism and a coverage mechanism. In the online recommendation phase, for a given code snippet, we use the offline trained model to generate question titles to assist less experienced developers in writing questions more effectively. At the same time, we embed the given code snippet into a vector and retrieve the related questions with similar problematic code snippets.

Software Engineering

Broken External Links on Stack Overflow

413 - Jiakun Liu , Xin Xia , David Lo 2020

Stack Overflow hosts valuable programming-related knowledge with 11,926,354 links that reference to the third-party websites. The links that reference to the resources hosted outside the Stack Overflow websites extend the Stack Overflow knowledge base substantially. However, with the rapid development of programming-related knowledge, many resources hosted on the Internet are not available anymore. Based on our analysis of the Stack Overflow data that was released on Jun. 2, 2019, 14.2% of the links on Stack Overflow are broken links. The broken links on Stack Overflow can obstruct viewers from obtaining desired programming-related knowledge, and potentially damage the reputation of the Stack Overflow as viewers might regard the posts with broken links as obsolete. In this paper, we characterize the broken links on Stack Overflow. 65% of the broken links in our sampled questions are used to show examples, e.g., code examples. 70% of the broken links in our sampled answers are used to provide supporting information, e.g., explaining a certain concept and describing a step to solve a problem. Only 1.67% of the posts with broken links are highlighted as such by viewers in the posts comments. Only 5.8% of the posts with broken links removed the broken links. Viewers cannot fully rely on the vote scores to detect broken links, as broken links are common across posts with different vote scores. The websites that host resources that can be maintained by their users are referenced by broken links the most on Stack Overflow -- a prominent example of such websites is GitHub. The posts and comments related to the web technologies, i.e., JavaScript, HTML, CSS, and jQuery, are associated with more broken links. Based on our findings, we shed lights for future directions and provide recommendations for practitioners and researchers.

Software Engineering

Mining Architecture Tactics and Quality Attributes Knowledge in Stack Overflow

177 - Tingting Bi , Peng Liang , Antony Tang 2021

Context: Architecture Tactics (ATs) are architectural building blocks that provide general architectural solutions for addressing Quality Attributes (QAs) issues. Mining and analyzing QA-AT knowledge can help the software architecture community better understand architecture design. However, manually capturing and mining this knowledge is labor-intensive and difficult. Objective: Using Stack Overflow (SO) as our source, our main goals are to effectively mine such knowledge; and to have some sense of how developers use ATs with respect to QA concerns from related discussions. Methods: We applied a semi-automatic dictionary-based mining approach to extract the QA-AT posts in SO. With the mined QA-AT posts, we identified the relationships between ATs and QAs. Results: Our approach allow us to mine QA-AT knowledge effectively with an F-measure of 0.865 and Performance of 82.2%. Using this mining approach, we are able to discover architectural synonyms of QAs and ATs used by designers, from which we discover how developers apply ATs to address quality requirements. Conclusions: We make two contributions in this work: First, we demonstrated a semi-automatic approach to mine ATs and QAs from SO posts; Second, we identified little-known design relationships between QAs and ATs and grouped architectural design considerations to aid architects make architecture tactics design decisions.

Software Engineering

An Exploratory Study on the Repeatedly Shared External Links on Stack Overflow

111 - Jiakun Liu , Haoxiang Zhang , Xin Xia 2021

On Stack Overflow, users reuse 11,926,354 external links to share the resources hosted outside the Stack Overflow website. The external links connect to the existing programming-related knowledge and extend the crowdsourced knowledge on Stack Overflow. Some of the external links, so-called as repeated external links, can be shared for multiple times. We observe that 82.5% of the link sharing activities (i.e., sharing links in any question, answer, or comment) on Stack Overflow share external resources, and 57.0% of the occurrences of the external links are sharing the repeated external links. However, it is still unclear what types of external resources are repeatedly shared. To help users manage their knowledge, we wish to investigate the characteristics of the repeated external links in knowledge sharing on Stack Overflow. In this paper, we analyze the repeated external links on Stack Overflow. We observe that external links that point to the text resources (hosted in documentation websites, tutorial websites, etc.) are repeatedly shared the most. We observe that: 1) different users repeatedly share the same knowledge in the form of repeated external links, thus increasing the maintenance effort of knowledge (e.g., update invalid links in multiple posts), 2) the same users can repeatedly share the external links for the purpose of promotion, and 3) external links can point to webpages with an overload of information that is difficult for users to retrieve relevant information. Our findings provide insights to Stack Overflow moderators and researchers. For example, we encourage Stack Overflow to centrally manage the commonly occurring knowledge in the form of repeated external links in order to better maintain the crowdsourced knowledge on Stack Overflow.

Software Engineering