No Arabic abstract
The writing process consists of several stages such as drafting, revising, editing, and proofreading. Studies on writing assistance, such as grammatical error correction (GEC), have mainly focused on sentence editing and proofreading, where surface-level issues such as typographical, spelling, or grammatical errors should be corrected. We broaden this focus to include the earlier revising stage, where sentences require adjustment to the information included or major rewriting and propose Sentence-level Revision (SentRev) as a new writing assistance task. Well-performing systems in this task can help inexperienced authors by producing fluent, complete sentences given their rough, incomplete drafts. We build a new freely available crowdsourced evaluation dataset consisting of incomplete sentences authored by non-native writers paired with their fin
The advent of large pre-trained language models has made it possible to make high-quality predictions on how to add or change a sentence in a document. However, the high branching factor inherent to text generation impedes the ability of even the strongest language models to offer useful editing suggestions at a more global or document level. We introduce a new task, document sketching, which involves generating entire draft documents for the writer to review and revise. These drafts are built from sets of documents that overlap in form - sharing large segments of potentially reusable text - while diverging in content. To support this task, we introduce a Wikipedia-based dataset of analogous documents and investigate the application of weakly supervised methods, including use of a transformer-based mixture of experts, together with reinforcement learning. We report experiments using automated and human evaluation methods and discuss relative merits of these models.
Language technologies play a key role in assisting people with their writing. Although there has been steady progress in e.g., grammatical error correction (GEC), human writers are yet to benefit from this progress due to the high development cost of integrating with writing software. We propose TEASPN, a protocol and an open-source framework for achieving integrated writing assistance environments. The protocol standardizes the way writing software communicates with servers that implement such technologies, allowing developers and researchers to integrate the latest developments in natural language processing (NLP) with low cost. As a result, users can enjoy the integrated experience in their favorite writing software. The results from experiments with human participants show that users use a wide range of technologies and rate their writing experience favorably, allowing them to write more fluent text.
Variational auto-encoders (VAEs) are widely used in natural language generation due to the regularization of the latent space. However, generating sentences from the continuous latent space does not explicitly model the syntactic information. In this paper, we propose to generate sentences from disentangled syntactic and semantic spaces. Our proposed method explicitly models syntactic information in the VAEs latent space by using the linearized tree sequence, leading to better performance of language generation. Additionally, the advantage of sampling in the disentangled syntactic and semantic latent spaces enables us to perform novel applications, such as the unsupervised paraphrase generation and syntax-transfer generation. Experimental results show that our proposed model achieves similar or better performance in various tasks, compared with state-of-the-art related work.
Stuttering is a speech disorder which impacts the personal and professional lives of millions of people worldwide. To save themselves from stigma and discrimination, people who stutter (PWS) may adopt different strategies to conceal their stuttering. One of the common strategies is word substitution where an individual avoids saying a word they might stutter on and use an alternative instead. This process itself can cause stress and add more burden. In this work, we present Fluent, an AI augmented writing tool which assists PWS in writing scripts which they can speak more fluently. Fluent embodies a novel active learning based method of identifying words an individual might struggle pronouncing. Such words are highlighted in the interface. On hovering over any such word, Fluent presents a set of alternative words which have similar meaning but are easier to speak. The user is free to accept or ignore these suggestions. Based on such user interaction (feedback), Fluent continuously evolves its classifier to better suit the personalized needs of each user. We evaluated our tool by measuring its ability to identify difficult words for 10 simulated users. We found that our tool can identify difficult words with a mean accuracy of over 80% in under 20 interactions and it keeps improving with more feedback. Our tool can be beneficial for certain important life situations like giving a talk, presentation, etc. The source code for this tool has been made publicly accessible at github.com/bhavyaghai/Fluent.
Context: The algorithms for generating a safe fluent API are actively studied these years. A safe fluent API is the fluent API that reports incorrect chaining of the API methods as a type error to the API users. Although such a safe property improves the productivity of its users, the construction of a safe fluent API is too complicated for the developers. The generation algorithms are studied to reduce the development cost of a safe fluent API. The study on the generation would benefit a number of programmers since a fluent API is a popular design in the real world. Inquiry: The generation of a generic fluent API has been left untackled. A generic fluent API refers to the fluent API that provides generic methods (methods that contain type parameters in their definitions). The Stream API in Java is an example of such a generic API. The recent research on the safe fluent API generation rather focuses on the grammar class that the algorithm can deal with for syntax checking. The key idea of the previous study is to use nested generics to represent a stack structure for the parser built on top of the type system. In that idea, the role of a type parameter was limited to internally representing a stack element of that parser on the type system. The library developers could not use type parameters to include a generic method in their API so that the semantic constraints for their API would be statically checked, for example, the type constraint on the items passed through a stream. Approach: We propose an algorithm to generate a generic fluent API. Our translation algorithm is modeled as the construction of deterministic finite automaton (DFA) with type parameter information. Each state of the DFA holds information about which type parameters are already bound in that state. This information is used to identify whether a method invocation in a chain newly binds a type to a type parameter, or refers to a previously bound type. The identification is required since a type parameter in a chain is bound at a particular method invocation, and that bound type is referred to in the following method invocations. Our algorithm constructs the DFA by analyzing the binding time of type parameters and their propagation among the states in a DFA that is naively constructed from the given grammar. Knowledge and Importance: Our algorithm helps library developers to develop a generic fluent API. The ability to generate a generic fluent API is essential to bring the safe fluent API generation to the real world since the use of type parameters is a common technique in the library API design. By our algorithm, the generation of a safe fluent API will be ready for practical use. Grounding: We implemented a generator named Protocool to demonstrate our algorithm. We also generated several libraries using Protocool to show the ability and the limitations of our algorithm.