Do you want to publish a course? Click here

Discourse-Based Sentence Splitting

العقوبة القائمة على الخطاب تقسيم

346   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

Sentence splitting involves the segmentation of a sentence into two or more shorter sentences. It is a key component of sentence simplification, has been shown to help human comprehension and is a useful preprocessing step for NLP tasks such as summarisation and relation extraction. While several methods and datasets have been proposed for developing sentence splitting models, little attention has been paid to how sentence splitting interacts with discourse structure. In this work, we focus on cases where the input text contains a discourse connective, which we refer to as discourse-based sentence splitting. We create synthetic and organic datasets for discourse-based splitting and explore different ways of combining these datasets using different model architectures. We show that pipeline models which use discourse structure to mediate sentence splitting outperform end-to-end models in learning the various ways of expressing a discourse relation but generate text that is less grammatical; that large scale synthetic data provides a better basis for learning than smaller scale organic data; and that training on discourse-focused, rather than on general sentence splitting data provides a better basis for discourse splitting.



References used
https://aclanthology.org/
rate research

Read More

Previous work indicates that discourse information benefits summarization. In this paper, we explore whether this synergy between discourse and summarization is bidirectional, by inferring document-level discourse trees from pre-trained neural summar izers. In particular, we generate unlabeled RST-style discourse trees from the self-attention matrices of the transformer model. Experiments across models and datasets reveal that the summarizer learns both, dependency- and constituency-style discourse information, which is typically encoded in a single head, covering long- and short-distance discourse dependencies. Overall, the experimental results suggest that the learned discourse information is general and transferable inter-domain.
Most of the existing studies of language use in social media content have focused on the surface-level linguistic features (e.g., function words and punctuation marks) and the semantic level aspects (e.g., the topics, sentiment, and emotions) of the comments. The writer's strategies of constructing and connecting text segments have not been widely explored even though this knowledge is expected to shed light on how people reason in online environments. Contributing to this analysis direction for social media studies, we build an openly accessible neural RST parsing system that analyzes discourse relations in an online comment. Our experiments demonstrate that this system achieves comparable performance among all the neural RST parsing systems. To demonstrate the use of this tool in social media analysis, we apply it to identify the discourse relations in persuasive and non-persuasive comments and examine the relationships among the binary discourse tree depth, discourse relations, and the perceived persuasiveness of online comments. Our work demonstrates the potential of analyzing discourse structures of online comments with our system and the implications of these structures for understanding online communications.
Dominant sentence ordering models can be classified into pairwise ordering models and set-to-sequence models. However, there is little attempt to combine these two types of models, which inituitively possess complementary advantages. In this paper, w e propose a novel sentence ordering framework which introduces two classifiers to make better use of pairwise orderings for graph-based sentence ordering (Yin et al. 2019, 2021). Specially, given an initial sentence-entity graph, we first introduce a graph-based classifier to predict pairwise orderings between linked sentences. Then, in an iterative manner, based on the graph updated by previously predicted high-confident pairwise orderings, another classifier is used to predict the remaining uncertain pairwise orderings. At last, we adapt a GRN-based sentence ordering model (Yin et al. 2019, 2021) on the basis of final graph. Experiments on five commonly-used datasets demonstrate the effectiveness and generality of our model. Particularly, when equipped with BERT (Devlin et al. 2019) and FHDecoder (Yin et al. 2020), our model achieves state-of-the-art performance. Our code is available at https://github.com/DeepLearnXMU/IRSEG.
We investigate how sentence-level transformers can be modified into effective sequence labelers at the token level without any direct supervision. Existing approaches to zero-shot sequence labeling do not perform well when applied on transformer-base d architectures. As transformers contain multiple layers of multi-head self-attention, information in the sentence gets distributed between many tokens, negatively affecting zero-shot token-level performance. We find that a soft attention module which explicitly encourages sharpness of attention weights can significantly outperform existing methods.
The aim of this investigation is to explore the main rhetorical features of an Arabic newspaper discourse. To this end, extracts form two popular Jordanian newspapers were analyzed. The results of this study indicate that one of the features of th is type of discourse is redundancy, i.e. repetition of the same lexical item. Another feature is the explicit use of evaluative statements to support the writer’s point of view. Moreover, the results of this study revealed that Arabic newspaper discourse clearly marks clause relations especially subordinating clauses, and that discourse markers are mainly used to mark the relationships of contrast between or among propositions in this type of discourse.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا