تقدم الصابغة فرصة كمصدر بيانات للبحث في NLP والتعليم والعلوم الاجتماعية.ومع ذلك، فإن الإجابة على أسئلة بحثية محددة مع هذه البيانات صعبة، حيث تحتوي الصخور على أساليب كتابة أكثر تنوعا من الخيال الرسمي.نقدم خط أنابيب معالجة النصوص للقصص، مع التركيز على تحديد النص المرتبط بالأحرف.يتضمن خط الأنابيب وحدات لتحديد الأحرف وكور المعلومات، وكذلك إسناد الاقتباس والسرد إلى تلك الشخصيات.بالإضافة إلى ذلك، يحتوي خط الأنابيب على نهج رواية في Conment Coreence الذي يستخدم المعرفة من إسناد Quote لحل الضمائر داخل علامات الاقتباس.لكل وحدة، نقوم بتقييم فعالية النهج المختلفة على 10 قصص صانفة مشروحة.هذا خط أنابيب تتفوق الأدوات المتقدمة للخيال الرسمي على مهام Aquerence Aquer Aquare و Quote
Fanfiction presents an opportunity as a data source for research in NLP, education, and social science. However, answering specific research questions with this data is difficult, since fanfiction contains more diverse writing styles than formal fiction. We present a text processing pipeline for fanfiction, with a focus on identifying text associated with characters. The pipeline includes modules for character identification and coreference, as well as the attribution of quotes and narration to those characters. Additionally, the pipeline contains a novel approach to character coreference that uses knowledge from quote attribution to resolve pronouns within quotes. For each module, we evaluate the effectiveness of various approaches on 10 annotated fanfiction stories. This pipeline outperforms tools developed for formal fiction on the tasks of character coreference and quote attribution
References used
We present EventPlus, a temporal event understanding pipeline that integrates various state-of-the-art event understanding components including event trigger and type detection, event argument detection, event duration and temporal relation extractio
Split-and-rephrase is a challenging task that promotes the transformation of a given complex input sentence into multiple shorter sentences retaining equivalent meaning. This rewriting approach conceptualizes that shorter sentences benefit human read
Code-mixing is a phenomenon of mixing words and phrases from two or more languages in a single utterance of speech and text. Due to the high linguistic diversity, code-mixing presents several challenges in evaluating standard natural language generat
In this technical report, we describe the fine-tuned ASR-MT pipeline used for the IWSLT shared task. We remove less useful speech samples by checking WER with an ASR model, and further train a wav2vec and Transformers-based ASR module based on the fi
We adopt, evaluate, and improve upon a two-step natural language understanding (NLU) pipeline that incrementally tames the variation of unconstrained natural language input and maps to executable robot behaviors. The pipeline first leverages Abstract