بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Automatic Generation of Benchmarks for Plagiarism Detection Tools using Grammatical Evolution

44 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Manuel Cebri\\'an

تاريخ النشر 2007

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Manuel Cebrian - Manuel Alfonseca - Alfonso Ortega

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper has been withdrawn by the authors due to a major rewriting.

قيم البحث

470 - Manuel Cebrian , Manuel Alfonseca , Alfonso Ortega 2010

In a previous work, the authors proposed a Grammatical Evolution algorithm to automatically generate Lindenmayer Systems which represent fractal curves with a pre-determined fractal dimension. This paper gives strong statistical evidence that the pro bability distributions of the execution time of that algorithm exhibits a heavy tail with an hyperbolic probability decay for long executions, which explains the erratic performance of different executions of the algorithm. Three different restart strategies have been incorporated in the algorithm to mitigate the problems associated to heavy tail distributions: the first assumes full knowledge of the execution time probability distribution, the second and third assume no knowledge. These strategies exploit the fact that the probability of finding a solution in short executions is non-negligible and yield a severe reduction, both in the expected execution time (up to one order of magnitude) and in its variance, which is reduced from an infinite to a finite value.

الحوسبة العصبية والتطورية الحساب الرمزي

PonyGE2: Grammatical Evolution in Python

49 - Michael Fenton , James McDermott , David Fagan 2017

Grammatical Evolution (GE) is a population-based evolutionary algorithm, where a formal grammar is used in the genotype to phenotype mapping process. PonyGE2 is an open source implementation of GE in Python, developed at UCDs Natural Computing Resear ch and Applications group. It is intended as an advertisement and a starting-point for those new to GE, a reference for students and researchers, a rapid-prototyping medium for our own experiments, and a Python workout. As well as providing the characteristic genotype to phenotype mapping of GE, a search algorithm engine is also provided. A number of sample problems and tutorials on how to use and adapt PonyGE2 have been developed.

الحوسبة العصبية والتطورية

Automatic Generation of High-Coverage Tests for RTL Designs using Software Techniques and Tools

59 - Yu Zhang , Wenlong Feng , Mengxing Huang 2016

Register Transfer Level (RTL) design validation is a crucial stage in the hardware design process. We present a new approach to enhancing RTL design validation using available software techniques and tools. Our approach converts the source code of a RTL design into a C++ software program. Then a powerful symbolic execution engine is employed to execute the converted C++ program symbolically to generate test cases. To better generate efficient test cases, we limit the number of cycles to guide symbolic execution. Moreover, we add bit-level symbolic variable support into the symbolic execution engine. Generated test cases are further evaluated by simulating the RTL design to get accurate coverage. We have evaluated the approach on a floating point unit (FPU) design. The preliminary results show that our approach can deliver high-quality tests to achieve high coverage.

هندسة البرمجيات هندسة العتاد

Tools and Benchmarks for Automated Log Parsing

160 - Jieming Zhu , Shilin He , Jinyang Liu 2018

Logs are imperative in the development and maintenance process of many software systems. They record detailed runtime information that allows developers and support engineers to monitor their systems and dissect anomalous behaviors and errors. The in creasing scale and complexity of modern software systems, however, make the volume of logs explodes. In many cases, the traditional way of manual log inspection becomes impractical. Many recent studies, as well as industrial tools, resort to powerful text search and machine learning-based analytics solutions. Due to the unstructured nature of logs, a first crucial step is to parse log messages into structured data for subsequent analysis. In recent years, automated log parsing has been widely studied in both academia and industry, producing a series of log parsers by different techniques. To better understand the characteristics of these log parsers, in this paper, we present a comprehensive evaluation study on automated log parsing and further release the tools and benchmarks for easy reuse. More specifically, we evaluate 13 log parsers on a total of 16 log datasets spanning distributed systems, supercomputers, operating systems, mobile systems, server applications, and standalone software. We report the benchmarking results in terms of accuracy, robustness, and efficiency, which are of practical importance when deploying automated log parsing in production. We also share the success stories and lessons learned in an industrial application at Huawei. We believe that our work could serve as the basis and provide valuable guidance to future research and deployment of automated log parsing.

هندسة البرمجيات

Corpora Generation for Grammatical Error Correction

142 - Jared Lichtarge , Chris Alberti , Shankar Kumar 2019

Grammatical Error Correction (GEC) has been recently modeled using the sequence-to-sequence framework. However, unlike sequence transduction problems such as machine translation, GEC suffers from the lack of plentiful parallel data. We describe two a pproaches for generating large parallel datasets for GEC using publicly available Wikipedia data. The first method extracts source-target pairs from Wikipedia edit histories with minimal filtration heuristics, while the second method introduces noise into Wikipedia sentences via round-trip translation through bridge languages. Both strategies yield similar sized parallel corpora containing around 4B tokens. We employ an iterative decoding strategy that is tailored to the loosely supervised nature of our constructed corpora. We demonstrate that neural GEC models trained using either type of corpora give similar performance. Fine-tuning these models on the Lang-8 corpus and ensembling allows us to surpass the state of the art on both the CoNLL-2014 benchmark and the JFLEG task. We provide systematic analysis that compares the two approaches to data generation and highlights the effectiveness of ensembling.

الحساب واللغة التعلم الالي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة الإسلامية في لبنان

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Automatic Generation of Benchmarks for Plagiarism Detection Tools using Grammatical Evolution

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

This paper has been withdrawn by the authors due to a major rewriting.

اقرأ أيضاً