Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Automatic Generation of Benchmarks for Plagiarism Detection Tools using Grammatical Evolution

44 0 0.0 ( 0 )

Download Cite

Added by Manuel Cebri\\'an

Publication date 2007

fields Informatics Engineering

and research's language is English

Authors Manuel Cebrian - Manuel Alfonseca - Alfonso Ortega

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper has been withdrawn by the authors due to a major rewriting.

rate research

Grammatical Evolution with Restarts for Fast Fractal Generation

485 - Manuel Cebrian , Manuel Alfonseca , Alfonso Ortega 2010

In a previous work, the authors proposed a Grammatical Evolution algorithm to automatically generate Lindenmayer Systems which represent fractal curves with a pre-determined fractal dimension. This paper gives strong statistical evidence that the probability distributions of the execution time of that algorithm exhibits a heavy tail with an hyperbolic probability decay for long executions, which explains the erratic performance of different executions of the algorithm. Three different restart strategies have been incorporated in the algorithm to mitigate the problems associated to heavy tail distributions: the first assumes full knowledge of the execution time probability distribution, the second and third assume no knowledge. These strategies exploit the fact that the probability of finding a solution in short executions is non-negligible and yield a severe reduction, both in the expected execution time (up to one order of magnitude) and in its variance, which is reduced from an infinite to a finite value.

Neural and Evolutionary Computing Symbolic Computation

PonyGE2: Grammatical Evolution in Python

49 - Michael Fenton , James McDermott , David Fagan 2017

Grammatical Evolution (GE) is a population-based evolutionary algorithm, where a formal grammar is used in the genotype to phenotype mapping process. PonyGE2 is an open source implementation of GE in Python, developed at UCDs Natural Computing Research and Applications group. It is intended as an advertisement and a starting-point for those new to GE, a reference for students and researchers, a rapid-prototyping medium for our own experiments, and a Python workout. As well as providing the characteristic genotype to phenotype mapping of GE, a search algorithm engine is also provided. A number of sample problems and tutorials on how to use and adapt PonyGE2 have been developed.

Neural and Evolutionary Computing

Automatic Generation of High-Coverage Tests for RTL Designs using Software Techniques and Tools

59 - Yu Zhang , Wenlong Feng , Mengxing Huang 2016

Register Transfer Level (RTL) design validation is a crucial stage in the hardware design process. We present a new approach to enhancing RTL design validation using available software techniques and tools. Our approach converts the source code of a RTL design into a C++ software program. Then a powerful symbolic execution engine is employed to execute the converted C++ program symbolically to generate test cases. To better generate efficient test cases, we limit the number of cycles to guide symbolic execution. Moreover, we add bit-level symbolic variable support into the symbolic execution engine. Generated test cases are further evaluated by simulating the RTL design to get accurate coverage. We have evaluated the approach on a floating point unit (FPU) design. The preliminary results show that our approach can deliver high-quality tests to achieve high coverage.

Software Engineering Hardware Architecture

Tools and Benchmarks for Automated Log Parsing

160 - Jieming Zhu , Shilin He , Jinyang Liu 2018

Logs are imperative in the development and maintenance process of many software systems. They record detailed runtime information that allows developers and support engineers to monitor their systems and dissect anomalous behaviors and errors. The increasing scale and complexity of modern software systems, however, make the volume of logs explodes. In many cases, the traditional way of manual log inspection becomes impractical. Many recent studies, as well as industrial tools, resort to powerful text search and machine learning-based analytics solutions. Due to the unstructured nature of logs, a first crucial step is to parse log messages into structured data for subsequent analysis. In recent years, automated log parsing has been widely studied in both academia and industry, producing a series of log parsers by different techniques. To better understand the characteristics of these log parsers, in this paper, we present a comprehensive evaluation study on automated log parsing and further release the tools and benchmarks for easy reuse. More specifically, we evaluate 13 log parsers on a total of 16 log datasets spanning distributed systems, supercomputers, operating systems, mobile systems, server applications, and standalone software. We report the benchmarking results in terms of accuracy, robustness, and efficiency, which are of practical importance when deploying automated log parsing in production. We also share the success stories and lessons learned in an industrial application at Huawei. We believe that our work could serve as the basis and provide valuable guidance to future research and deployment of automated log parsing.

Software Engineering

Corpora Generation for Grammatical Error Correction

142 - Jared Lichtarge , Chris Alberti , Shankar Kumar 2019

Grammatical Error Correction (GEC) has been recently modeled using the sequence-to-sequence framework. However, unlike sequence transduction problems such as machine translation, GEC suffers from the lack of plentiful parallel data. We describe two approaches for generating large parallel datasets for GEC using publicly available Wikipedia data. The first method extracts source-target pairs from Wikipedia edit histories with minimal filtration heuristics, while the second method introduces noise into Wikipedia sentences via round-trip translation through bridge languages. Both strategies yield similar sized parallel corpora containing around 4B tokens. We employ an iterative decoding strategy that is tailored to the loosely supervised nature of our constructed corpora. We demonstrate that neural GEC models trained using either type of corpora give similar performance. Fine-tuning these models on the Lang-8 corpus and ensembling allows us to surpass the state of the art on both the CoNLL-2014 benchmark and the JFLEG task. We provide systematic analysis that compares the two approaches to data generation and highlights the effectiveness of ensembling.

Computation and Language Machine Learning

comments

Fetching comments

Mustansiriyah University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Automatic Generation of Benchmarks for Plagiarism Detection Tools using Grammatical Evolution

Ask ChatGPT about the research

No Arabic abstract

This paper has been withdrawn by the authors due to a major rewriting.

Read More