New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Bag-of-Words Baselines for Semantic Code Search

BACK-الكلمات الأساسية للبحث عن رمز الدلالي

229 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

semantic code search code search semantic code كود الدلالي للبحث الرمز البحث الكود الدلالي صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The task of semantic code search is to retrieve code snippets from a source code corpus based on an information need expressed in natural language. The semantic gap between natural language and programming languages has for long been regarded as one of the most significant obstacles to the effectiveness of keyword-based information retrieval (IR) methods. It is a common assumption that traditional'' bag-of-words IR methods are poorly suited for semantic code search: our work empirically investigates this assumption. Specifically, we examine the effectiveness of two traditional IR methods, namely BM25 and RM3, on the CodeSearchNet Corpus, which consists of natural language queries paired with relevant code snippets. We find that the two keyword-based methods outperform several pre-BERT neural models. We also compare several code-specific data pre-processing strategies and find that specialized tokenization improves effectiveness.

References used

https://aclanthology.org/

rate research

Grouping Words with Semantic Diversity

390 - Association for Computation Linguistics 2021 مقالة

Deep Learning-based NLP systems can be sensitive to unseen tokens and hard to learn with high-dimensional inputs, which critically hinder learning generalization. We introduce an approach by grouping input words based on their semantic diversity to s implify input language representation with low ambiguity. Since the semantically diverse words reside in different contexts, we are able to substitute words with their groups and still distinguish word meanings relying on their contexts. We design several algorithms that compute diverse groupings based on random sampling, geometric distances, and entropy maximization, and we prove formal guarantees for the entropy-based algorithms. Experimental results show that our methods generalize NLP models and demonstrate enhanced accuracy on POS tagging and LM tasks and significant improvements on medium-scale machine translation tasks, up to +6.5 BLEU points. Our source code is available at https://github.com/abdulrafae/dg.

semantic diversity deep learning-based nlp learning-based nlp systems التنوع الدلالي التعلم العميق القائم على NLP أنظمة NLP القائمة على التعلم صناعة حمض الفوسفور المزيد..

UniParma at SemEval-2021 Task 5: Toxic Spans Detection Using CharacterBERT and Bag-of-Words Model

288 - Association for Computation Linguistics 2021 مقالة

With the ever-increasing availability of digital information, toxic content is also on the rise. Therefore, the detection of this type of language is of paramount importance. We tackle this problem utilizing a combination of a state-of-the-art pre-tr ained language model (CharacterBERT) and a traditional bag-of-words technique. Since the content is full of toxic words that have not been written according to their dictionary spelling, attendance to individual characters is crucial. Therefore, we use CharacterBERT to extract features based on the word characters. It consists of a CharacterCNN module that learns character embeddings from the context. These are, then, fed into the well-known BERT architecture. The bag-of-words method, on the other hand, further improves upon that by making sure that some frequently used toxic words get labeled accordingly. With a ∼4 percent difference from the first team, our system ranked 36 th in the competition. The code is available for further research and reproduction of the results.

يمتد سامة صناعة حمض الفوسفور

Contextualized Query Embeddings for Conversational Search

397 - Association for Computation Linguistics 2021 مقالة

This paper describes a compact and effective model for low-latency passage retrieval in conversational search based on learned dense representations. Prior to our work, the state-of-the-art approach uses a multi-stage pipeline comprising conversation al query reformulation and information retrieval modules. Despite its effectiveness, such a pipeline often includes multiple neural models that require long inference times. In addition, independently optimizing each module ignores dependencies among them. To address these shortcomings, we propose to integrate conversational query reformulation directly into a dense retrieval model. To aid in this goal, we create a dataset with pseudo-relevance labels for conversational search to overcome the lack of training data and to explore different training strategies. We demonstrate that our model effectively rewrites conversational queries as dense representations in conversational search and open-domain question answering datasets. Finally, after observing that our model learns to adjust the L2 norm of query token embeddings, we leverage this property for hybrid retrieval and to support error analysis.

conversational search contextualized query embeddings البحث عن المحادثة ermeddings استفسار السياق صناعة حمض الفوسفور

Integrating Semantic Scenario and Word Relations for Abstractive Sentence Summarization

475 - Association for Computation Linguistics 2021 مقالة

Recently graph-based methods have been adopted for Abstractive Text Summarization. However, existing graph-based methods only consider either word relations or structure information, which neglect the correlation between them. To simultaneously captu re the word relations and structure information from sentences, we propose a novel Dual Graph network for Abstractive Sentence Summarization. Specifically, we first construct semantic scenario graph and semantic word relation graph based on FrameNet, and subsequently learn their representations and design graph fusion method to enhance their correlation and obtain better semantic representation for summary generation. Experimental results show our model outperforms existing state-of-the-art methods on two popular benchmark datasets, i.e., Gigaword and DUC 2004.

النص غير المدلل abstractive text summarization sentence summarization تلخيص النص المبشري تلخيص الجملة صناعة حمض الفوسفور

Learning to Organize a Bag of Words into Sentences with Neural Networks: An Empirical Study

299 - Association for Computation Linguistics 2021 مقالة

Sequential information, a.k.a., orders, is assumed to be essential for processing a sequence with recurrent neural network or convolutional neural network based encoders. However, is it possible to encode natural languages without orders? Given a bag of words from a disordered sentence, humans may still be able to understand what those words mean by reordering or reconstructing them. Inspired by such an intuition, in this paper, we perform a study to investigate how order'' information takes effects in natural language learning. By running comprehensive comparisons, we quantitatively compare the ability of several representative neural models to organize sentences from a bag of words under three typical scenarios, and summarize some empirical findings and challenges, which can shed light on future research on this line of work.

تعزيز التعلم صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Bag-of-Words Baselines for Semantic Code Search

BACK-الكلمات الأساسية للبحث عن رمز الدلالي

Ask ChatGPT about the research

Read More

suggested questions